Guide – How GLEAMviz Uses Big Data

When you build an infectious disease model using the GLEAMviz modeling program and server, you are using Big Data.

Illustration of three layers of data of the GLEAMviz Epidemic and Mobility Model. Photo Credit

GLEAMviz integrates three layers of global, real-world big data: Global population, mobility of populations, and mathematical models of worldwide infection progression. GLEAMviz accesses data from satellite and census sources and calculates densities of population centers and clusters of transportation hubs.

How the disease spreads is simulated and visualized on a Visualization Dashboard using these three layers of data, mathematical calculations built into the program, and characteristics of the disease outbreak entered into the simulation builder by the user. Characteristics are expressed in rates of spread, infection, or recovery, or other compartments you have included in your simulation.

Infection spreads in the population as people commute to work, attend school, and travel on air flights. These activities are able to be manipulated on the Settings page of GLEAMviz so that mitigation strategies such as school or airport closings can be added to the simulations and results analyzed.  

Data Sources

GLEAMviz has divided the world into grids approximately 25 x 25 kilometers. By knowing the coordinates of each cell and available airports and transportation options, subpopulations can be calculated and used in the infectious disease model.

GLEAMviz gets its population data from the Gridded Populations of the World and the Global Urban-Rural Mapping projects, run by the Socioeconomic Data and Application Center (SEDAC) of Columbia University.

Airport data is obtained from 12 different flight networks from worldwide booking databases from the Official Airline Guide (OAG) database. Datasets from 3,800 airports in over 230 countries and analysis of over 4,000,000 connections are used to enhance the analysis of infection spread.

Commuting data is obtained from the national statistics offices of more than 40 countries over five continents. Utilized together, the datasets encompass over 5 million commuting connections which affect infection spread from subpopulation to subpopulation.

Data in the Compartmental Model

The compartmental data model is the backbone of disease spread analysis in the GLEAMviz program. You build the disease through compartmental modeling because every individual in a population fits into a compartment at one time or another. In one possible example, if your home town were to experience a disease outbreak, every individual would at some point be in a Susceptible group, an Exposed group, an Infected group, or a Recovered group. Individuals would most likely pass from one group to another. You build in rates that determine how quickly or how likely it is to pass from one group to another depending on the disease characteristics.


GLEAMviz then uses stochastic (randomly determined) algorithms (set of definite mathematical operations) to mathematically define the numbers of individuals in the groups over time and display the transitions and disease spread visually on a simulation dashboard.

Big Data allows global information to be incorporated into a disease modeling program you can use to model infectious disease.


GLEAMviz: The Global Epidemic and Mobility Model:

The GLEAMviz computation tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale. W. Van den Broeck, C. Gioannini, B. Goncalves, M. Quaggiotto, V. Colizza, and A. Vesignani. BMC Infectious Diseases 11, 37 (2011).

Seasonal transmission potential and activity peaks of the new influenza A (H1N1): a Monte Carlo likelihood analysis based on human mobility. D. Blacan, H. Hu, B. Goncalves, P. Bajardi, C. Poletto, J.J. Ramasco, D. Paolotti, N. Perra, M. Tizzoni, W. Van den Broeck, V. Colizza, and A. Vespignani. BMC Medicine 7, 45 (2009).

Modeling the spatial spread of infectious diseases: The Global Epidemic and Mobility computational model. D. Balcan, B. Goncalves, H.Hu, J. J. Ramasco, V. Colizza, and A. Vespignani. Journal of Computational Science 1, 132 (2010).