To learn about what Facebook Data for Good is doing in response to the COVID19 pandemic, click here.

Social Connectedness Index Methodology

Methodology

We use an anonymized snapshot of all active Facebook users and their friendship networks to measure the intensity of connectedness between locations. Locations are assigned to users based on their information and activity on Facebook, including the stated city on their Facebook profile, and device and connection information. Our primary measure of Social Connectedness between two locations i and j is:

 

Here, FB_Usersi and FB_Usersj are the number of Facebook users in locations i and j, and FB_Connectionsi,j is the number of Facebook friendship connections between the two.

Social Connectednessi,j, therefore, measures the relative probability of a Facebook friendship link between a given Facebook user in location i and a user in location j. Put differently, if this measure is twice as large, a Facebook user in i is about twice as likely to be connected with a given Facebook user in j.

In each dataset, we scale the measure to have a fixed maximum value (by dividing the original measure by the maximum and multiplying by 1,000,000,000) and the lowest possible value of 1. We also round the measure to the nearest integer.

We also take several steps to preserve user privacy:

  1. We remove all locations with fewer than 100 active users, except for countries, which are only included if they have more than 50,000 active users.
  2. We add random N(0,1) noise (rounded to the nearest integer) to the number of friendships between each pair of locations. The number of friendships after this noise is added cannot be less than 0.
  3. The SCI presented here is the average SCI across 10 draws of 99% of the population of active Facebook users.

We exclude the following areas: Afghanistan, Western Sahara, China, Cuba, Iraq, Israel, Iran, North Korea, Russia, Syria, Somalia, South Sudan, Sudan, Venezuela, Yemen, Crimea, Jammu and Kashmir, Donetsk, Luhansk, Sevastopol, West Bank, and Gaza.

You can find a more detailed methodology in the Journal of Economic Perspectives here.

Data Release – August 2020

The data within this folder include this measure calculated for different geographical areas as of August 2020. Each dataset has three columns: user_loc, fr_loc, and scaled_sci. They include every (symmetric) i to j and j to i location pair. It also includes a number of potentially relevant academic papers. 

The August 2020 version of the datasets included are:

  • International Countries. Each row is a country – country pair. Countries are denoted by their ISO2 code.
  • US Counties. Each row is a US county – US county pair. Counties are denoted by their 5-digit FIPS code.
  • US Counties to Country. Each row is a US county –country pair. Counties are denoted by their 5-digit FIPS code, countries are denoted by their ISO2 code.

There are two files built on the Database of Global Administrative Areas (GADM) and the European Nomenclature of Territorial Units for Statistics (NUTS) areas. These data use GADM version 2.8 and NUTS 2016. 

  • GADM1_NUTS2: Countries outside of Europe are broken into their GADM level 1 boundaries (e.g. states in USA) if their population > 1 million. Otherwise the area is the full country. European countries are broken into their NUTS2 regions (e.g. 12 provinces in the Netherlands). Each row is a pair of these areas.
  • GADM1_NUTS3_Counties: Countries with population < 1 million are not broken up. European countries are broken into NUTS3 regions (e.g. 40 regions in the Netherlands). The US, Canada, and the countries of the Indian Subcontinent with population > 1 million (Bangladesh, India, Nepal, Pakistan, and Sri Lanka) are broken into their GADM2 regions (e.g. US counties). All other countries are broken into their GADM1 regions. Each row is a pair of these areas.

How to Access the Social Connectedness Index

Download the Facebook Social Connectedness Index here on the United Nations Office for the Coordination of Humanitarian Affairs (OCHA)’s Humanitarian Data Exchange. 

To reference this data, please use the following citation:

Bailey, R. Cao, T. Kuchler, J. Stroebel, and A. Wong. Social connectedness: Measurements, determinants, and effects. Journal of Economic Perspectives, 32(3):259–80, 2018b. and the Facebook Data for Good Program, Social Connectedness Index (SCI). https://dataforgood.fb.com/, Accessed DAY MONTH YEAR.”

To personalize content, tailor and measure ads and provide a safer experience, we use cookies. By clicking "Accept All", you agree to our use of cookies on and off Facebook. Learn more, including about controls: Cookie Policy