Life expectancy at birth is defined as how long, on average, a newborn can expect to live, if current death rates do not change. This dataset can help you gain insights regarding the life expectancy and mortality rate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Life Expectancy vs GDP, 1950-2018’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/luxoloshilofunde/life-expectancy-vs-gdp-19502018 on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Life expectancy at birth is defined as the average number of years that a newborn could expect to live if he or she were to pass through life subject to the age-specific mortality rates of a given period. The years are from 1950 to 2018.
For regional- and global-level data pre-1950, data from a study by Riley was used, which draws from over 700 sources to estimate life expectancy at birth from 1800 to 2001.
Riley estimated life expectancy before 1800, which he calls "the pre-health transition period". "Health transitions began in different countries in different periods, as early as the 1770s in Denmark and as late as the 1970s in some countries of sub-Saharan Africa". As such, for the sake of consistency, we have assigned the period before the health transition to the year 1770. "The life expectancy values employed are averages of estimates for the period before the beginning of the transitions for countries within that region. ... This period has presumably the weakest basis, the largest margin of error, and the simplest method of deriving an estimate."
For country-level data pre-1950, Clio Infra's dataset was used, compiled by Zijdeman and Ribeira da Silva (2015).
For country-, regional- and global-level data post-1950, data published by the United Nations Population Division was used, since they are updated every year. This is possible because Riley writes that "for 1950-2001, I have drawn life expectancy estimates chiefly from various sources provided by the United Nations, the World Bank’s World Development Indicators, and the Human Mortality Database".
For the Americas from 1950-2015, the population-weighted average of Northern America and Latin America and the Caribbean was taken, using UN Population Division estimates of population size.
Life expectancy:
Data publisher's source: https://www.lifetable.de/RileyBib.pdf Data published by: James C. Riley (2005) – Estimates of Regional and Global Life Expectancy, 1800–2001. Issue Population and Development Review. Population and Development Review. Volume 31, Issue 3, pages 537–543, September 2005., Zijdeman, Richard; Ribeira da Silva, Filipa, 2015, "Life Expectancy at Birth (Total)", http://hdl.handle.net/10622/LKYT53, IISH Dataverse, V1, and UN Population Division (2019) Link: https://datasets.socialhistory.org/dataset.xhtml?persistentId=hdl:10622/LKYT53, http://onlinelibrary.wiley.com/doi/10.1111/j.1728-4457.2005.00083.x/epdf, https://population.un.org/wpp/Download/Standard/Population/ Dataset: https://ourworldindata.org/life-expectancy
GDP per capita:
Data publisher's source: The Maddison Project Database is based on the work of many researchers that have produced estimates of economic growth for individual countries. Data published by: Bolt, Jutta and Jan Luiten van Zanden (2020), “Maddison style estimates of the evolution of the world economy. A new 2020 update”. Link: https://www.rug.nl/ggdc/historicaldevelopment/maddison/releases/maddison-project-database-2020 Dataset: https://ourworldindata.org/life-expectancy
The life expectancy vs GDP per capita analysis.
--- Original source retains full ownership of the source dataset ---
The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.
What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!
SELECT
age.country_name,
age.life_expectancy,
size.country_area
FROM (
SELECT
country_name,
life_expectancy
FROM
bigquery-public-data.census_bureau_international.mortality_life_expectancy
WHERE
year = 2016) age
INNER JOIN (
SELECT
country_name,
country_area
FROM
bigquery-public-data.census_bureau_international.country_names_area
where country_area > 25000) size
ON
age.country_name = size.country_name
ORDER BY
2 DESC
/* Limit removed for Data Studio Visualization */
LIMIT
10
Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.
SELECT
age.country_name,
SUM(age.population) AS under_25,
pop.midyear_population AS total,
ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25
FROM (
SELECT
country_name,
population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population_agespecific
WHERE
year =2017
AND age < 25) age
INNER JOIN (
SELECT
midyear_population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population
WHERE
year = 2017) pop
ON
age.country_code = pop.country_code
GROUP BY
1,
3
ORDER BY
4 DESC /* Remove limit for visualization*/
LIMIT
10
The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.
SELECT
growth.country_name,
growth.net_migration,
CAST(area.country_area AS INT64) AS country_area
FROM (
SELECT
country_name,
net_migration,
country_code
FROM
bigquery-public-data.census_bureau_international.birth_death_growth_rates
WHERE
year = 2017) growth
INNER JOIN (
SELECT
country_area,
country_code
FROM
bigquery-public-data.census_bureau_international.country_names_area
Historic (none)
United States Census Bureau
Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘LifeExpectancyData’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/just249/lifeexpectancydatacsv on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Life Expectancy Prediction Using Artificial Intelligence: Research Paper: https://docs.google.com/document/d/1Abwx7C97sMjsfow5Xk8GOCDblNaQr8T8WXh7SIJAbVo/edit?usp=sharing
Introduction: According to the survey from PwC (PricewaterhouseCoopers) report in 2016, data have shown that nearly half (47%) of 18-34 age group surveyed had changed their eating habits towards a healthier diet and further data has shown that 53% of the age 18-34 claimed that they have planned to change their eating habits to be healthier over the next year. According to research done by LiveScience, eating healthy and doing physical activity can in fact increase our life expectancy, also in one of the articles from BBC (British Broadcasting Corporation) “Do we really live longer than our ancestors? ” have stated that in 1841, a baby girl and boy was expected to live just about 40 years of age, but in 2016 a baby girl or boy was expected to live till 80 years of age. Controllable factors like eating healthy and doing exercise regularly can in fact increase our life expectancy. But can non-controllable factors like Country’s status, mortality rates, GDP, schooling, average income, government’s expenditure on health and the rate of child deaths possibly affect our life expectancy? To answer those concerns, we will input data from a Dataset called “Life Expectancy(WHO)” provided by Kumar Rajarshi in Kaggle and with the help of machine learning to process a considerable amount of data to train and analyze and make a prediction of life expectancy based on the value we feed to the algorithm.
Project Details: For this project, I have used the Dataset called “Life Expectancy(WHO)” provided by Kumar Rajarshi from Kaggle, to try to predict the total life expectancy by inputting non-controllable factors according to the data set like Country’s status, mortality rates, GDP, schooling, average income, government’s expenditure on health and the rate of child deaths to answer will non-controllable factor affect our life expectancy.
--- Original source retains full ownership of the source dataset ---
This table contains 2394 series, with data for years 1991 - 1991 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 items: Canada ...), Population group (19 items: Entire cohort; Income adequacy quintile 1 (lowest);Income adequacy quintile 2;Income adequacy quintile 3 ...), Age (14 items: At 25 years; At 30 years; At 40 years; At 35 years ...), Sex (3 items: Both sexes; Females; Males ...), Characteristics (3 items: Life expectancy; High 95% confidence interval; life expectancy; Low 95% confidence interval; life expectancy ...).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The United States Census Bureau’s International Dataset provides estimates of country populations since 1950 and projections through 2050.
The U.S. Census Bureau provides estimates and projections for countries and areas that are recognized by the U.S. Department of State that have a population of at least 5,000. Specifically, the data set includes midyear population figures broken down by age and gender assignment at birth. Additionally, they provide time-series data for attributes including fertility rates, birth rates, death rates, and migration rates.
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_international
https://cloud.google.com/bigquery/public-data/international-census
Dataset Source: www.census.gov
This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source -http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by Steve Richey from Unsplash.
What countries have the longest life expectancy?
Which countries have the largest proportion of their population under 25?
Which countries are seeing the largest net migration?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.
Key Features
- Country: Name of the country.
- Density (P/Km2): Population density measured in persons per square kilometer.
- Abbreviation: Abbreviation or code representing the country.
- Agricultural Land (%): Percentage of land area used for agricultural purposes.
- Land Area (Km2): Total land area of the country in square kilometers.
- Armed Forces Size: Size of the armed forces in the country.
- Birth Rate: Number of births per 1,000 population per year.
- Calling Code: International calling code for the country.
- Capital/Major City: Name of the capital or major city.
- CO2 Emissions: Carbon dioxide emissions in tons.
- CPI: Consumer Price Index, a measure of inflation and purchasing power.
- CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
- Currency_Code: Currency code used in the country.
- Fertility Rate: Average number of children born to a woman during her lifetime.
- Forested Area (%): Percentage of land area covered by forests.
- Gasoline_Price: Price of gasoline per liter in local currency.
- GDP: Gross Domestic Product, the total value of goods and services produced in the country.
- Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
- Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
- Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
- Largest City: Name of the country's largest city.
- Life Expectancy: Average number of years a newborn is expected to live.
- Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
- Minimum Wage: Minimum wage level in local currency.
- Official Language: Official language(s) spoken in the country.
- Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
- Physicians per Thousand: Number of physicians per thousand people.
- Population: Total population of the country.
- Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
- Tax Revenue (%): Tax revenue as a percentage of GDP.
- Total Tax Rate: Overall tax burden as a percentage of commercial profits.
- Unemployment Rate: Percentage of the labor force that is unemployed.
- Urban Population: Percentage of the population living in urban areas.
- Latitude: Latitude coordinate of the country's location.
- Longitude: Longitude coordinate of the country's location.
Potential Use Cases
- Analyze population density and land area to study spatial distribution patterns.
- Investigate the relationship between agricultural land and food security.
- Examine carbon dioxide emissions and their impact on climate change.
- Explore correlations between economic indicators such as GDP and various socio-economic factors.
- Investigate educational enrollment rates and their implications for human capital development.
- Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
- Study labor market dynamics through indicators such as labor force participation and unemployment rates.
- Investigate the role of taxation and its impact on economic development.
- Explore urbanization trends and their social and environmental consequences.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.
The QoG Institute is an independent research institute within the Department of Political Science at the University of Gothenburg. The main objective of our research is to address the theoretical and empirical problem of how political institutions of high quality can be created and maintained.
To achieve said goal, the QoG Institute makes comparative data on QoG and its correlates publicly available. To accomplish this, we have compiled several datasets that draw on a number of freely available data sources, including aggregated individual-level data.
The QoG OECD Datasets focus exclusively on OECD member countries. They have a high data coverage in terms of geography and time. In the QoG OECD TS dataset, data from 1946 to 2021 is included and the unit of analysis is country-year (e.g., Sweden-1946, Sweden-1947, etc.).
In the QoG OECD Cross-Section dataset, data from and around 2018 is included. Data from 2018 is prioritized, however, if no data are available for a country for 2018, data for 2019 is included. If no data for 2019 exists, data for 2017 is included, and so on up to a maximum of +/- 3 years. In the QoG OECD Time-Series dataset, data from 1946 to 2021 are included and the unit of analysis is country-year (e.g. Sweden-1946, Sweden-1947 and so on).
The QoG OECD Datasets focus exclusively on OECD member countries. They have a high data coverage in terms of geography and time. In the QoG OECD Time-Series dataset, data from 1946 to 2021 are included and the unit of analysis is country-year (e.g. Sweden-1946, Sweden-1947 and so on).
The Health Inequality Project uses big data to measure differences in life expectancy by income across areas and identify strategies to improve health outcomes for low-income Americans.
This table reports life expectancy point estimates and standard errors for men and women at age 40 for each percentile of the national income distribution. Both race-adjusted and unadjusted estimates are reported.
This table reports life expectancy point estimates and standard errors for men and women at age 40 for each percentile of the national income distribution separately by year. Both race-adjusted and unadjusted estimates are reported.
This dataset was created on 2020-01-10 18:53:00.508
by merging multiple datasets together. The source datasets for this version were:
Commuting Zone Life Expectancy Estimates by year: CZ-level by-year life expectancy estimates for men and women, by income quartile
Commuting Zone Life Expectancy: Commuting zone (CZ)-level life expectancy estimates for men and women, by income quartile
Commuting Zone Life Expectancy Trends: CZ-level estimates of trends in life expectancy for men and women, by income quartile
Commuting Zone Characteristics: CZ-level characteristics
Commuting Zone Life Expectancy for larger populations: CZ-level life expectancy estimates for men and women, by income ventile
This table reports life expectancy point estimates and standard errors for men and women at age 40 for each quartile of the national income distribution by state of residence and year. Both race-adjusted and unadjusted estimates are reported.
This table reports US mortality rates by gender, age, year and household income percentile. Household incomes are measured two years prior to the mortality rate for mortality rates at ages 40-63, and at age 61 for mortality rates at ages 64-76. The “lag” variable indicates the number of years between measurement of income and mortality.
Observations with 1 or 2 deaths have been masked: all mortality rates that reflect only 1 or 2 deaths have been recoded to reflect 3 deaths
This table reports coefficients and standard errors from regressions of life expectancy estimates for men and women at age 40 for each quartile of the national income distribution on calendar year by commuting zone of residence. Only the slope coefficient, representing the average increase or decrease in life expectancy per year, is reported. Trend estimates for both race-adjusted and unadjusted life expectancies are reported. Estimates are reported for the 100 largest CZs (populations greater than 590,000) only.
This table reports life expectancy estimates at age 40 for Males and Females for all countries. Source: World Health Organization, accessed at: http://apps.who.int/gho/athena/
This table reports life expectancy point estimates and standard errors for men and women at age 40 for each quartile of the national income distribution by county of residence. Both race-adjusted and unadjusted estimates are reported. Estimates are reported for counties with populations larger than 25,000 only
This table reports life expectancy point estimates and standard errors for men and women at age 40 for each quartile of the national income distribution by commuting zone of residence and year. Both race-adjusted and unadjusted estimates are reported. Estimates are reported for the 100 largest CZs (populations greater than 590,000) only.
This table reports US population and death counts by age, year, and sex from various sources. Counts labelled “dm1” are derived from the Social Security Administration Data Master 1 file. Counts labelled “irs” are derived from tax data. Counts labelled “cdc” are derived from NCHS life tables.
This table reports numerous county characteristics, compiled from various sources. These characteristics are described in the county life expectancy table.
Two variables constructed by the Cen
The QoG Institute is an independent research institute within the Department of Political Science at the University of Gothenburg. Overall 30 researchers conduct and promote research on the causes, consequences and nature of Good Governance and the Quality of Government - that is, trustworthy, reliable, impartial, uncorrupted and competent government institutions.
The main objective of our research is to address the theoretical and empirical problem of how political institutions of high quality can be created and maintained. A second objective is to study the effects of Quality of Government on a number of policy areas, such as health, the environment, social policy, and poverty.
The dataset was created as part of a research project titled “Quality of Government and the Conditions for Sustainable Social Policy”. The aim of the dataset is to promote cross-national comparative research on social policy output and its correlates, with a special focus on the connection between social policy and Quality of Government (QoG).
The data comes in three versions: one cross-sectional dataset, and two cross-sectional time-series datasets for a selection of countries. The two combined datasets are called “long” (year 1946-2009) and “wide” (year 1970-2005).
The data contains six types of variables, each provided under its own heading in the codebook: Social policy variables, Tax system variables, Social Conditions, Public opinion data, Political indicators, Quality of government variables.
QoG Social Policy Dataset can be downloaded from the Data Archive of the QoG Institute at http://qog.pol.gu.se/data/datadownloads/data-archive Its variables are now included in QoG Standard.
Purpose:
The primary aim of QoG is to conduct and promote research on corruption. One aim of the QoG Institute is to make publicly available cross-national comparative data on QoG and its correlates. The aim of the QoG Social Policy Dataset is to promote cross-national comparative research on social policy output and its correlates, with a special focus on the connection between social policy and Quality of Government (QoG).
The dataset combining cross-sectional data and time-series data for a selection of 40 countries. The dataset is specifically tailored for the analysis of public opinion data over time, instead uses country as its unit of observation, and one variable for every 5th year from 1970-2005 (or, one per module of each public opinion data source).
Samanni, Marcus. Jan Teorell, Staffan Kumlin, Stefan Dahlberg, Bo Rothstein, Sören Holmberg & Richard Svensson. 2012. The QoG Social Policy Dataset, version 4Apr12. University of Gothenburg:The Quality of Government Institute. http://www.qog.pol.gu.se
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a dataset of the most highly populated city (if applicable) in a form easy to join with the COVID19 Global Forecasting (Week 1) dataset. You can see how to use it in this kernel
There are four columns. The first two correspond to the columns from the original COVID19 Global Forecasting (Week 1) dataset. The other two is the highest population density, at city level, for the given country/state. Note that some countries are very small and in those cases the population density reflects the entire country. Since the original dataset has a few cruise ships as well, I've added them there.
Thanks a lot to Kaggle for this competition that gave me the opportunity to look closely at some data and understand this problem better.
Summary: I believe that the square root of the population density should relate to the logistic growth factor of the SIR model. I think the SEIR model isn't applicable due to any intervention being too late for a fast-spreading virus like this, especially in places with dense populations.
After playing with the data provided in COVID19 Global Forecasting (Week 1) (and everything else online or media) a bit, one thing becomes clear. They have nothing to do with epidemiology. They reflect sociopolitical characteristics of a country/state and, more specifically, the reactivity and attitude towards testing.
The testing method used (PCR tests) means that what we measure could potentially be a proxy for the number of people infected during the last 3 weeks, i.e the growth (with lag). It's not how many people have been infected and recovered. Antibody or serology tests would measure that, and by using them, we could go back to normality faster... but those will arrive too late. Way earlier, China will have experimentally shown that it's safe to go back to normal as soon as your number of newly infected per day is close to zero.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F197482%2F429e0fdd7f1ce86eba882857ac7a735e%2Fcovid-summary.png?generation=1585072438685236&alt=media" alt="">
My view, as a person living in NYC, about this virus, is that by the time governments react to media pressure, to lockdown or even test, it's too late. In dense areas, everyone susceptible has already amble opportunities to be infected. Especially for a virus with 5-14 days lag between infections and symptoms, a period during which hosts spread it all over on subway, the conditions are hopeless. Active populations have already been exposed, mostly asymptomatic and recovered. Sensitive/older populations are more self-isolated/careful in affluent societies (maybe this isn't the case in North Italy). As the virus finishes exploring the active population, it starts penetrating the more isolated ones. At this point in time, the first fatalities happen. Then testing starts. Then the media and the lockdown. Lockdown seems overly effective because it coincides with the tail of the disease spread. It helps slow down the virus exploring the long-tail of sensitive population, and we should all contribute by doing it, but it doesn't cause the end of the disease. If it did, then as soon as people were back in the streets (see China), there would be repeated outbreaks.
Smart politicians will test a lot because it will make their condition look worse. It helps them demand more resources. At the same time, they will have a low rate of fatalities due to large denominator. They can take credit for managing well a disproportionally major crisis - in contrast to people who didn't test.
We were lucky this time. We, Westerners, have woken up to the potential of a pandemic. I'm sure we will give further resources for prevention. Additionally, we will be more open-minded, helping politicians to have more direct responses. We will also require them to be more responsible in their messages and reactions.
Abstract copyright UK Data Service and data collection copyright owner.
The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The study is being conducted in Ethiopia, India, Peru and Vietnam and has tracked the lives of 12,000 children over a 20-year period, through 5 (in-person) survey rounds (Round 1-5) and, with the latest survey round (Round 6) conducted over the phone in 2020 and 2021 as part of the Listening to Young Lives at Work: COVID-19 Phone Survey.Young Lives research has expanded to explore linking geographical data collected during the rounds to external datasets. Matching Young Lives data with administrative and geographic datasets significantly increases the scope for research in several areas, and may allow researchers to identify sources of exogenous variation for more convincing causal analysis on policy and/or early life circumstances.
Young Lives: Data Matching Series, 1900-2021 includes the following linked datasets:
1. Climate Matched Datasets (four YL study countries): Community-level GPS data has been matched with temperature and precipitation data from the University of Delaware. Climate variables are offered at the community level, with a panel data structure spanning across years and months. Hence, each community has a unique value of precipitation (variable PRCP) and temperature (variable TEMP), for each year and month pairing for the period 1900-2017.
2. COVID-19 Matched Dataset (Peru only): The YL Phone Survey Calls data has been matched with external data sources (The Peruvian Ministry of Health and the National Information System of Deaths in Peru). The matched dataset includes the total number of COVID cases per 1,000 inhabitants, the total number of COVID deaths by district and per 1,000 inhabitants; the total number of excess deaths per 1,000 inhabitants and the number of lockdown days in each Young Lives district in Peru during August 2020 to December 2021.
Further information is available in the PDF reports included in the study documentation.
Climate Matched Datasets: 5 variables including anonymised community identifier, monthly average temperature, monthly total precipitation, and year and month of climate data.
COVID-19 Matched Dataset (Peru): 29 variables to covering anonymised respondent identifier, cumulative number of COVID-19 cases per 1,000 inhabitants, fatalities, migration, vaccine distribution, and lockdown conditions implemented by the Peruvian government in areas where YL participants were living at the time of the Phone Survey Calls.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This project analyzes the 2020 World Happiness Report to draw conclusions about the general well being of Africa. It uses several CSV files consisting of survey responses formed from a Google Form survey, data from the 2020 World Happiness Report and data on countries only in Africa from the 2020 World Happiness Report. The main data set used includes over 150 countries and their happiness scores, freedom to make life choices, social support, healthy life expectancy, regional indicator, perceptions of corruption and generosity. This analysis was done to answer the following data-driven questions: 'Which African country ranked the happiest in 2020?' and 'Which variable predicts or explains Africa's happiness score?'
This project includes several programs created in R and Python.
The Gallup World Poll (GWP) is conducted annually to measure and track public attitudes concerning political, social and economic issues, including controversial and sensitive subjects. Annually, this poll tracks attitudes toward law and order, institutions and infrastructure, jobs, well-being and other topics for approximately 150 countries worldwide. The data gathered from the GWP is used to create an annual World Happiness Report (WHR). The World Happiness Report is conducted to review the science of understanding and measuring the subjective well-being and to use survey measures of life satisfaction to track the quality of lives in over 150 countries.
At first glance, it seems that world happiness isn't important or maybe it's just an emotional thing. However, several governments have started to look at happiness as a metric to measure success. Happiness Scores or Subjective Well-being (SWB) are national average responses to questions of life evaluation. They are important because they remind policy makers and people in power that happiness is based on social capital, not just financial. Happiness is often considered an essential and useful way to guide public policies and measure their effectiveness. It is also important to note that happiness scores point out the importance of qualitative rather than quantitative. At times, quality is better than quantity.
Africa is the world's second largest and second most populous continent in the world. It consists of 54 countries meaning that Africa has the most countries. Africa has approximately 30% of the earth's mineral resources and has the largest reserves of precious metals. Africa reserves over 40% of the gold reserves, 60% on cobalt and 90% of platinum. However, Africa unfortunately has the most developmental challenges. It is the world's poorest and most underdeveloped continent. Africa is also almost 100% colonized with the exceptions of Ethiopia and Liberia. Given this information, one can wonder what the SWB or state of happiness is in Africa?
This site analyzes the 2020 World Happiness Report to draw conclusions to data-drive questions listed later on this page. The focus is specifically on countries in Africa. Even though there are 54 countries in Africa, only 43 participated in the 2020 WHR.
The dataset used is generated from the 'World Happiness Report 2020'. This dataset contains the Happiness Score for over 150 countries for the year of 2020. The data gathered from the Gallup World Poll gives a national average of Happiness scores for countries all over the world. It is a annual landmark survey of the state of global happiness.
This dataset is from the data repository "Kaggle". On Kaggle's dataset page, I searched for Africa Happiness after filtering the search to CSV file type. I wasn't able to find any datasets that could answer my questions that didn't include other countries from different continents. I decided to use a Global Happiness Report to answer the questions I have. The dataset I am using was publish by Micheal Londeen and it was created on March 24, 2020. His main source is the World Happiness Report for 2020.
Happiness score or subjective well-being (variable name ladder ): The survey measure of SWB is from the Feb 28, 2020 release of the Gallup World Poll (GWP) covering years from 2005 to 2019. Unless stated otherwise, it is the national average response to the question of life evaluations. The English wording of the question is “Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?” This measure is also referred to as Cantril life ladder, or just life ladder in our analysis.
Healthy Life Expectancy (HLE). Healthy life expectancies at birth are based on the data extracted from the World Health Organization’s (WHO) Global Health Observatory dat...
The QoG Institute is an independent research institute within the Department of Political Science at the University of Gothenburg. Overall 30 researchers conduct and promote research on the causes, consequences and nature of Good Governance and the Quality of Government - that is, trustworthy, reliable, impartial, uncorrupted and competent government institutions.
The main objective of our research is to address the theoretical and empirical problem of how political institutions of high quality can be created and maintained. A second objective is to study the effects of Quality of Government on a number of policy areas, such as health, the environment, social policy, and poverty.
QoG Standard Dataset is our largest data set consisting of more than 2,000 variables from sources related to the Quality of Government.
In the QoG Standard CS dataset, data from and around 2018 is included. Data from 2018 is prioritized, however, if no data is available for a country for 2018, data for 2019 is included. If no data exists for 2019, data for 2017 is included, and so on up to a maximum of +/- 3 years.
In the QoG Standard TS dataset, data from 1946 to 2021 is included and the unit of analysis is country-year (e.g., Sweden-1946, Sweden-1947, etc.).
In the QoG Standard TS dataset, data from 1946 to 2021 is included and the unit of analysis is country-year (e.g., Sweden-1946, Sweden-1947, etc.).
Historical countries are in most cases denoted with a do-date (e.g. Ethiopia (-1992) and a from-date (Ethiopia (1993-)).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The World Happiness Report may be a point of interest survey of the state of worldwide bliss. The primary report was distributed in 2012, the second in 2013, the third in 2015, and the fourth within the 2016 Upgrade. The World Joy 2017, which positions 155 nations by their bliss levels, was discharged at the Joined together Countries at an occasion celebrating Universal Day of Joy on Walk 20th. The report proceeds to pick up worldwide acknowledgment as governments, organizations and respectful society progressively utilize joy pointers to educate their policy-making choices. Driving specialists over areas – financial matters, brain research, overview investigation, national insights, wellbeing, open approach and more – depict how estimations of well-being can be used effectively to evaluate the advance of countries. The reports survey the state of bliss within the world nowadays and appear how the modern science of bliss clarifies individual and national varieties in bliss.
The joy scores and rankings utilize information from the Gallup World Survey. The scores are based on answers to the most life evaluation address inquired within the survey. This address, known as the Cantril step, asks respondents to think of a step with the most excellent conceivable life for them being a 10 and the most exceedingly bad conceivable life being a and to rate their claim current lives on that scale. The scores are from broadly agent tests for the a long time 2013-2016 and utilize the Gallup weights to create the gauges agent. The columns taking after the bliss score assess the degree to which each of six variables – financial generation, social back, life anticipation, flexibility, nonattendance of debasement, and liberality – contribute to making life assessments higher in each nation than they are in Dystopia, a theoretical nation that has values rise to to the world’s least national midpoints for each of the six variables. They have no affect on the full score detailed for each nation, but they do exp
This file contains the Happiness Score for 153 countries along with the factors used to explain the score.
The Happiness Score is a national average of the responses to the main life evaluation question asked in the Gallup World Poll (GWP), which uses the Cantril Ladder.
The Happiness Score is explained by the following factors:
GDP per capita Healthy Life Expectancy Social support Freedom to make life choices Generosity Corruption Perception Residual error The data is described in much more detail here: link
I did not create this data, only sourced it. The credit goes to the original Authors:
Editors: John Helliwell, Richard Layard, Jeffrey D. Sachs, and Jan Emmanuel De Neve, Co-Editors; Lara Aknin, Haifang Huang and Shun Wang, Associate Editors; and Sharon Paculor, Production Editor
Citation: Helliwell, John F., Richard Layard, Jeffrey Sachs, and Jan-Emmanuel De Neve, eds. 2020. World Happiness Report 2020. New York: Sustainable Development Solutions Network
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Global Living Arrangements Database (GLAD), is a global resource designed to fill a critical gap in the availability of statistical information for examining patterns and changes in living arrangements by age, sex, marital status and educational attainment. Utilizing comprehensive census microdata from IPUMS International and the European Labour Force Survey (EU-LFS), GLAD summarizes over 740 million individual records across 107 countries, covering the period from 1960 to 2021. This database has been constructed using an innovative algorithm that reconstructs kinship relationships among all household members, providing a robust and scalable methodology for studying living arrangements. GLAD is expected to be a valuable resource for both researchers and policymakers, supporting evidence-based decision-making in areas such as housing, social services, and healthcare, as well as offering insights into long-term transformations in family structures. The open-source R code used in this project is publicly available, promoting transparency and enabling the creation of new ego-centred typologies based in interfamily relationships
The repository is composed of the following elements: a Rda file named CORESIDENCE_GLAD_2025.Rda in the form of a List. In R, a List object is a versatile data structure that can contain a collection of different data types, including vectors, matrices, data frames, other lists, spatial objects or even functions. It allows to store and organize heterogeneous data elements within a single object. The CORESIDENCE_GLAD_2025 R-list object is composed of six elements:
By Data Exercises [source]
This dataset is a comprehensive collection of data from county-level cancer mortality and incidence rates in the United States between 2000-2014. This data provides an unprecedented level of detail into cancer cases, deaths, and trends at a local level. The included columns include County, FIPS, age-adjusted death rate, average death rate per year, recent trend (2) in death rates, recent 5-year trend (2) in death rates and average annual count for each county. This dataset can be used to provide deep insight into the patterns and effects of cancer on communities as well as help inform policy decisions related to mitigating risk factors or increasing preventive measures such as screenings. With this comprehensive set of records from across the United States over 15 years, you will be able to make informed decisions regarding individual patient care or policy development within your own community!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides comprehensive US county-level cancer mortality and incidence rates from 2000 to 2014. It includes the mortality and incidence rate for each county, as well as whether the county met the objective of 45.5 deaths per 100,000 people. It also provides information on recent trends in death rates and average annual counts of cases over the five year period studied.
This dataset can be extremely useful to researchers looking to study trends in cancer death rates across counties. By using this data, researchers will be able to gain valuable insight into how different counties are performing in terms of providing treatment and prevention services for cancer patients and whether preventative measures and healthcare access are having an effect on reducing cancer mortality rates over time. This data can also be used to inform policy makers about counties needing more target prevention efforts or additional resources for providing better healthcare access within at risk communities.
When using this dataset, it is important to pay close attention to any qualitative columns such as “Recent Trend” or “Recent 5-Year Trend (2)” that may provide insights into long term changes that may not be readily apparent when using quantitative variables such as age-adjusted death rate or average deaths per year over shorter periods of time like one year or five years respectively. Additionally, when studying differences between different counties it is important to take note of any standard FIPS code differences that may indicate that data was collected by a different source with a difference methodology than what was used in other areas studied
- Using this dataset, we can identify patterns in cancer mortality and incidence rates that are statistically significant to create treatment regimens or preventive measures specifically targeting those areas.
- This data can be useful for policymakers to target areas with elevated cancer mortality and incidence rates so they can allocate financial resources to these areas more efficiently.
- This dataset can be used to investigate which factors (such as pollution levels, access to medical care, genetic make up) may have an influence on the cancer mortality and incidence rates in different US counties
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: death .csv | Column name | Description | |:-------------------------------------------|:-------------------------------------------------------------------...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Since 2020, the world has faced two unprecedented shocks: lockdowns (regulation) and the invasion of Ukraine (war). Although we realise the health and economic effects of these shocks, more research is needed on the effect on happiness and whether the type of shock plays a role. Therefore, in this paper, we determine whether these macro-level shocks affected happiness, how these effects differ, and how long it takes for happiness to adapt to previous levels. The latter will allow us to test whether adaptation theory holds at the macro level. We use a unique dataset of ten countries spanning the Northern and Southern hemispheres derived from tweets extracted in real-time per country. Applying Natural Language Processing, we obtain these tweets’ underlying sentiment scores, after which we calculate a happiness score (Gross National Happiness) and derive daily time series data. Our Twitter dataset is combined with Oxford’s COVID-19 Government Response Tracker data. Considering the results of the Difference-in-Differences and event studies jointly, we are confident that the shocks led to lower happiness levels, both with the lockdown and the invasion shock. We find that the effect size is significant and that the lockdown shock had a bigger effect than the invasion. Considering both types of shocks, the adaptation to previous happiness levels occurred within two to three weeks. Following our findings of similar behaviour in happiness to both types of shocks, the question of whether other types of shocks will have similar effects is posited. Regardless of the length of the adaptation period, understanding the effects of macro-level shocks on happiness is essential for policymakers, as happiness has a spillover effect on other variables such as production, safety and trust.
The Eurovision Song Contest is an annual music competition that began in 1956. It is one of the longest-running television programmes in the world and is watched by millions of people every year. The contest's winner is determined using numerous voting techniques, including points awarded by juries or televoters.
Since 2004, the contest has included a televised semi-final::— In 2004 held on the Wednesday before the final:— Between 2005 and 2007 held on the Thursday of Eurovision Week n2 - Since 2008 the contest has included two semi-finals, held on the Tuesday and Thursday before the final.
The Eurovision Song Contest is a truly global event, with countries from all over Europe (and beyond) competing for the coveted prize. Over the years, some truly amazing performers have taken to the stage, entertaining audiences with their catchy songs and stunning stage performances.
So who will be crowned this year's winner? Tune in to find out!
This dataset contains information on all of the winners of the Eurovision Song Contest from 1956 to the present day. The data includes the year that the contest was held, the city that hosted it, the winning song and performer, the margin of points between the winning song and runner-up, and the runner-up country.
This dataset can be used to study patterns in Eurovision voting over time, or to compare different winning songs and performers. It could also be used to study how hosting the contest affects a country's chances of winning
- In order to studyEurovision Song Contest winners, one could use this dataset to train a machine learning model to predict the winner of the contest given a set of features about the song and the performers.
- This dataset could be used to study how different voting methods (e.g. jury vs televoters) impact the outcome of the Eurovision Song Contest.
- This dataset could be used to study trends in music over time by looking at how the style ofwinner songs has changed since the contest began in 1956
Data from eurovision_winners.csv was scraped from Wikipedia on April 4, 2020.
The dataset eurovision_winners.csv contains a list of all the winners of the Eurovision Song Contest from 1956 to the present day
License
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: eurovision_winners.csv | Column name | Description | |:--------------|:---------------------------------------------------------------------------------------------| | Year | The year in which the contest was held. (Integer) | | Date | The date on which the contest was held. (String) | | Host City | The city in which the contest was held. (String) | | Winner | The country that won the contest. (String) | | Song | The song that won the contest. (String) | | Performer | The performer of the winning song. (String) | | Points | The number of points that the winning song received. (Integer) | | Margin | The margin of victory (in points) between the winning song and the runner-up song. (Integer) | | Runner-up | The country that placed second in the contest. (String) |
Life expectancy at birth is defined as how long, on average, a newborn can expect to live, if current death rates do not change. This dataset can help you gain insights regarding the life expectancy and mortality rate.