Using AWS to Access High Resolution Population Density Maps and Demographic Data

These high-resolution maps estimate the number of people living within 30-meter grid tiles in nearly every country around the world. Additionally, our datasets provide insights on the distribution of certain populations within each country, including the number of children under five, the number of women of reproductive age, as well as young and elderly populations, at unprecedentedly high resolutions. To use High-Resolution Settlement Layer (HRSL) population maps distributed as an AWS public dataset, try simple and fast database-style SQL queries using Amazon Athena from your AWS account. Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds. Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you run. Table setup and initial test queries should take no more than 5 minutes.

Methodology

For detailed information about the methodology behind our High Resolution Population Density Maps and our Demographic Estimates, please visit: https://dataforgood.fb.com/docs/methodology-high-resolution-population-density-maps-demographic-estimates/

Create Table

Before you start, you’ll need to create a table to access the data in S3. Do this in AWS region U.S. East (N. Virginia) where HRSL is stored to avoid cross-region data transfer overhead.

Table Creation Details

Enter these details in the AWS console’s four-step Add Table wizard:

  1. Name & Location
      • Choose a database and table name such as “hrsl”
      • Location of data:
         s3://dataforgood-fb-data/csv/
  2. Data Format
    • HRSL data files are served as tab-separated values (TSV)
  3. Columns
    • Data files include three columns:
      1. latitude (float)
      2. longitude (float)
      3. population (float)
  4. Partitions
    • Three partitions are defined:
      1. month (string) – currently only “2019-06” has been published but this may include future updates to HRSL
      2. country (string) – ISO 3166-1 alpha-3 codes
      3. type (string) – demographic groups, one of “men”, “women”, “children_under_five”, “elderly_60_plus”, “women_of_reproductive_age_15_49”, or “youth_15_24”

 

After completing the four steps in the wizard to run a pre-populated CREATE TABLE query, look for a “Query successful” note at bottom and follow the included link to “load all partitions”.

 

Load all HRSL partitions. This will take approximately 2 minutes. When it’s successful, the table listing in the left column includes longitude, latitude, and population floating point columns and month, country, and type table partitions.

Query Data

Query data in Athena using standard SQL. For example, to request the total population of Zimbabwe, sum the count of men and women with country code “ZWE”:

SELECT min(latitude) AS min_lat,
         min(longitude) AS min_lon,
         max(latitude) AS max_lat,
         max(longitude) AS max_lon,
         cast(sum(population) as integer) AS population,
         country
FROM hrsl
WHERE month='2019-06'
        AND country='ZWE'
        AND type IN ('men', 'women')
GROUP BY  country

 

Athena can perform more complex geospatial queries. To retrieve the total population in an area surrounding Lake Victoria, start by defining a freehand polygon at geojson.io and exporting it as a WKT string.

Use the ST_Polygon and ST_Point constructors to create two geometries and compare them with ST_Within to select only population samples inside the designated area above:

SELECT min(latitude) AS min_lat,
         min(longitude) AS min_lon,
         max(latitude) AS max_lat,
         max(longitude) AS max_lon,
         cast(sum(population) as integer) AS population,
         country
FROM hrsl
WHERE month='2019-06'
        AND type IN ('men', 'women')
        AND ST_Within(ST_Point(longitude, latitude), ST_Polygon('POLYGON ((30.454 3.294, 28.959 0.659, 28.125 -3.294, 29.838 -8.754, 33.574 -10.141, 39.726 -8.581, 41.835 -2.021, 38.232 3.995, 32.827 4.390, 30.454 3.294))'))
GROUP BY  country

 

This query returned covered populations in six countries after 92 seconds and cost $0.01 for 2.97GB of data transferred. Add a list of interesting countries to the query to reduce the cost to 23 seconds and $0.0008 for 167MB of data:

SELECT min(latitude) AS min_lat,
         min(longitude) AS min_lon,
         max(latitude) AS max_lat,
         max(longitude) AS max_lon,
         cast(sum(population) AS integer) AS population,
         country
FROM hrsl
WHERE month='2019-06'
        AND type IN ('men', 'women')
        AND country IN ('UGA', 'COD', 'ZMB', 'BDI', 'RWA', 'TZA')
        AND ST_Within(ST_Point(longitude, latitude), ST_Polygon('POLYGON ((30.454 3.294, 28.959 0.659, 28.125 -3.294, 29.838 -8.754, 33.574 -10.141, 39.726 -8.581, 41.835 -2.021, 38.232 3.995, 32.827 4.390, 30.454 3.294))'))
GROUP BY  country

 

To personalize content, tailor and measure ads and provide a safer experience, we use cookies. By tapping on the site you agree to our use of cookies on and off Facebook. Learn more, including about controls: Cookie Policy