To learn about what Facebook Data for Good is doing in response to the COVID19 pandemic, click here.

Tutorial: Identification of At-Risk populations using publicly available Facebook Data for Good datasets and QGIS


Effective responses to the COVID-19 pandemic require the identification of small clusters of at-risk populations in order to quickly target public health resources. Facebook Data for Good has responded to this challenge by producing valuable datasets and releasing them to the public. These datasets have global coverage and can be extremely useful in areas where authoritative demographic data or other other proxy data for movement is limited.

This tutorial will walk you through the creation of a map and dataset which identifies potential problem areas for COVID-19 vulnerable populations using two publicly available datasets produced by the Facebook Data for Good team. By following the steps below within the free and open source GIS analysis application QGIS, we will combine demographic estimates from the High Resolution Population Density dataset with metrics describing population movement from the Facebook Movement Range dataset. Together, these datasets will allow us to produce a risk index useful for identifying vulnerable areas.

Datasets and Required Tools

Download and Install QGIS

Download Source Datasets

GADM – Database of Global Administrative Areas

  1. Navigate to the following Page:
  2. Select the country of interest from the dropdown
  3. Click the Shapefile link to download a .zip file containing country shapefiles
  4. Save the files to a folder or location of your choice
  5. Do not unzip the file package

Facebook High Resolution Population Density Dataset

  1. Navigate to the Facebook organization page on the Humanitarian Data Exchange
  2. Use the ‘Location’ tab on the left panel to select a country of interest. (For this tutorial we will be using the Philippines)
  3. Navigate to the [Country Name]: High Resolution Population Density Maps + Demographic Estimates
  4. Download the geotiff version of the ‘elderly_60_plus’ and geotiff version of the total ‘population’ dataset to a working folder
  5. Do not unzip the files

Facebook Movement Range Dataset

  1. Navigate to the Facebook organization page on the Humanitarian Data Exchange
  2. Use the search bar to search for ‘Movement Range Maps’
  3. Download the file entitled ‘’ to a common directory
  4. Unzip the .txt file within the .zip archive to your working folder

Load the Datasets into QGIS

  1. Open QGIS
  2. Using the browser panel, navigate to the folder where you downloaded the datasets from the previous steps
  3. Locate the GADM zip file and click the down arrow to open the list of administrative boundaries
  4. Select the level _3 shapefile, right click (mac control-click), and select ‘add layer to project’
  5. Locate the two High Density population geotiff .zip files and add them to the project using the same process
  6. Load the Facebook Movement Range dataset into the QGIS project
  7. Using the QGIS menu bar Navigate to ‘Layer’ → ‘Add Layer’ → ‘Add Delimited Text Layer’
  8. Fill out the resulting dialogue box using the following settings:
  9. Click ‘Add’ when complete
  10. At the end of this process your QGIS project should look similar to this:

Select a time window for Analysis

For this tutorial we will perform our analysis over a 7 day window of time. Longer periods of time will require significantly longer processing times but are possible by changing the start and end dates in this step.

  1. Open the filter options by right clicking (mac control-click) on the movement range dataset within the Layers pane
  2. Type in a query defining the range of dates to be analyzed
    • Example: ‘ds’ >= ‘2020-07-25’ AND ‘ds’ <= ‘2020-07-31’
    • Meaning all dates between and including 2020-07-25 and 2020-07-31
  3. Click OK to finalize the filtering

Join the Movement Range Dataset to the GADM polygon dataset

In this section we will join the Facebook Movement Range dataset to the GADM polygons using a common field between the two datasets. The field ‘GID_2’ from the GADM polygon layer corresponds to polygon_id in the Movement Range dataset

  1. On the right side of the main QGIS window, locate the ‘Processing Toolbox’ pane
  2. Navigate to: ‘Vector General’ →  select ‘Join attributes by field value’
  3. Select the GADM polygon layer as the ‘Input Layer’
  4. Select ‘GID_2’ as the ‘Table Field’
  5. ‘Movement range table as ‘Input layer 2’
  6. ‘Polygon_id’ as Table Field 2
  7. ‘Create separate features for each matching feature (one-to-many)’ as ‘Join Type’
  8. Other options can be left as default
  9. Click Run
  10. Upon completion you will likely encounter a RED message stating that ‘123456 feature(s) from input layer could not be matched.’ This is okay and is the result of administrative areas where there were not enough users providing Facebook location data to produce the movement range data in a way that preserves user privacy.
  11. You should now have a new map layer called ‘Joined_layer’ in the layers panel within the main QGIS window

Calculate Zonal Statistics

Next we’ll use the high resolution population density layer dataset to calculate population values for each administrative area within the study area

  1. On the right side of the main QGIS window, locate the ‘Processing Toolbox’ pane
  2. Navigate to: ‘Raster Analysis’ →  select ‘Zonal Statistics’
  3. Within the Zonal Statistics dialog box first select the layer containing the overall population dataset
  4. Within the ‘Statistics to calculate’ field make sure that only ‘sum’ is selected
  5. Select ‘Ok’
  6. Select ‘Run’
  7. After this process completes, repeat using the ‘Elderly_60_plus’ dataset. Make sure to change the ‘Output column prefix’ to ‘ElderlyPop_’
  8. Select ‘Run’
  9. After this process completes, the ‘Joined layer’ dataset will contain:
    • Map geometry representing administrative regions for the country of interest
    • A column with mobility metrics for the date range specified
    • A column with total population estimates for each administrative region
    • A column with 60+ elderly population estimates for each administrative region

Calculate Percent Elderly

A component of our risk score calculation is the percentage of the population above the age 60. To create this value, we need to perform a simple division on the two population sum columns that we created in previous steps.

  1. Rename the dataset ‘Joined layer’ to something more descriptive. (ex. Risk_Analysis) by right clicking (mac control-click) on ‘Joined layer’ in the ‘Layers’ panel and clicking ‘Rename’
  2. Open the attribute table of the newly renamed layer by again right clicking the layer again and selecting ‘Open Attribute Table’
  3. You should be presented with a table view of the dataset we created in the previous steps. We will be working with the highlighted fields in the upcoming steps
  4. With the attribute table open, open the field calculator
  5. Within the field calculator create a new field called ‘PercentElderly’ which is a simple ratio of the fields containing the sum of the elderly population and the sum of the total population
  6. Set ‘Output field name’ to ‘Percent Elderly’
  7. Set the ‘Output field type’ to ‘Decimal number (real)’
  8. In the expression box define the calculation to be done by using the following formula: ElderlyPop_sum / TotalPop_sum
  9. Click ‘ok’ to create the new calculated field

Calculate a Weighted Risk Score

This section will outline how to create a risk score. Similar to the last section we will be using the field calculator to calculate the following formula:

Risk Score = (Percent Elderly) *  (1 – percent of FB population staying at home) *  (Elderly population Sum)

The above risk methodology represents the relationship between the percentage of the population that is elderly and the percent of people staying at or around home, all weighted by sum of the elderly population in the area.

  1. Open the attribute table
  2. Open the field calculator
  3. Setup the values as follows
  4. Output field name: ‘Score’
  5. Output field type: Whole Number
  6. Expression:  ((‘PercentElderly’ * 100) *  (100 – ( ‘all_day_ratio_single_tile_users’  * 100))) * ‘ElderlyPop_sum’
  7. Click ‘OK’

Calculate a Scaled Risk Score

The risk score that we created would be more understandable if we could view it as a number from 0-100 rather than as a raw number. This section will outline how to convert our risk scores into a scaled score

  1. Open the attribute table
  2. Open the field calculator
  3. Set up the values
    • Output field name: ‘Scaled Score’
    • Output field type: Whole Number
    • Expression:   scale_linear(Score, minimum(Score), maximum(Score), 0, 100)
  4. Click ‘OK’

Symbolize the Map

We have now created all of the columns and fields necessary to creating a map of our calculated risk scores. Now let’s color the map using the data we have created

  1. Within the ‘Layers’ pane, right click on the layer named ‘Risk Analysis’ and select ‘Properties’
  2. Within the properties menu select ‘symbology’
  3. Within the symbology tab:
  4. Use the top drop down menu and change option from ‘Single Symbol’ to ‘Graduated’
  5. In the ‘Value’ box, select the field we created called ‘Scaled Score’
  6. Click the small arrow on the right side of ‘color ramp’ and change the color palette to ‘spectral’
  7. Click the small arrow on the right side of ‘color ramp’ and select ‘invert color ramp’
  8. Click ‘Classify’ build the graduated list of values
  9. In the ‘Mode’ selector, select ‘Natural Breaks (Jenks) as the classification mode
  10. In the ‘Classes’ selector increase the number of classes to ‘10’
  11. Click the ‘Symbol’ button to open up the ‘Symbol settings menu
  12. In the ‘Symbol Settings’ menu change the stroke style to ‘No Pen’
  13. Click ‘Ok’ to close the window
  14. Click ‘Apply’ and ‘Ok’ to finish symbolizing the data
  15. Uncheck the checkboxes next to ‘population’ and ‘population_elderly’ layers to hide them from the map view. Resulting in a map view similar to the one below:

Enable Temporal Filtering

Next we will enable time features on our newly created dataset and map. This will allow us to use the time toolbar within QGIS to view our analysis across the full time range of our analysis

  1. Right click (mac control-click) on the ‘Risk Analysis’ layer within the layer panel
  2. Click on properties
  3. Click on the ‘Temporal Tab’
  4. Check the box ‘Temporal’ to active time features
  5. Change the configuration to ‘single field with Date/Time’
  6. Select ‘ds’ in the ‘Field’ selection
  7. Enter ‘1.00’ in the event duration and change the dropdown selector to ‘Days’
  8. Confirm that ‘Accumulate features over time’ is unchecked
  9. Click ‘Ok’
  10. Open the ‘Temporal control panel’
  11. Within the control panel click ‘enable animated navigation’
  12. Click the refresh button to gather the range dates
  13. Change ‘Step’ drop down option from ‘Hours’ to ‘Days’
  14. Use the control panel to navigate through the range of dates selected for analysis

Conclusion & Insights

This basic analysis allows small administrative areas with high levels of potential risk to be quickly identified. Using the dataset and map we created, we can examine the greater Manilla Metropolitan area.

We can clearly see the change in risk scores based on proximity to Manilla which has a younger, less mobile population and is less at risk for the most serious COVID-19 health effects. The highlighted hot spot is a Barangay or small administrative region called Bagong Silang. This region has a moderate level of mobility as of 2020-07-25, but an overall population that is much higher relative to the surrounding areas and therefore contains a proportionally higher population of those who are elderly (age 60+). This analysis allows for potential hotspots to be identified which may have been overlooked when using only population density data or movement data on its own.

To personalize content, tailor and measure ads and provide a safer experience, we use cookies. By clicking "Accept All", you agree to our use of cookies on and off Facebook. Learn more, including about controls: Cookie Policy