75 datasets found

o
Education Attainment and Enrollment around the World - Dataset - Data...
data.opendata.am
Updated Jul 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Education Attainment and Enrollment around the World - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/dcwb0038973
Explore at:
Dataset updated
Jul 7, 2023
Area covered
World
Description
Patterns of educational attainment vary greatly across countries, and across population groups within countries. In some countries, virtually all children complete basic education whereas in others large groups fall short. The primary purpose of this database, and the associated research program, is to document and analyze these differences using a compilation of a variety of household-based data sets: Demographic and Health Surveys (DHS); Multiple Indicator Cluster Surveys (MICS); Living Standards Measurement Study Surveys (LSMS); as well as country-specific Integrated Household Surveys (IHS) such as Socio-Economic Surveys.As shown at the website associated with this database, there are dramatic differences in attainment by wealth. When households are ranked according to their wealth status (or more precisely, a proxy based on the assets owned by members of the household) there are striking differences in the attainment patterns of children from the richest 20 percent compared to the poorest 20 percent.In Mali in 2012 only 34 percent of 15 to 19 year olds in the poorest quintile have completed grade 1 whereas 80 percent of the richest quintile have done so. In many countries, for example Pakistan, Peru and Indonesia, almost all the children from the wealthiest households have completed at least one year of schooling. In some countries, like Mali and Pakistan, wealth gaps are evident from grade 1 on, in other countries, like Peru and Indonesia, wealth gaps emerge later in the school system.The EdAttain website allows a visual exploration of gaps in attainment and enrollment within and across countries, based on the international database which spans multiple years from over 120 countries and includes indicators disaggregated by wealth, gender and urban/rural location. The database underlying that site can be downloaded from here.
M
World Population (1950-2025)
macrotrends.net
csv
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MACROTRENDS (2025). World Population (1950-2025) [Dataset]. https://www.macrotrends.net/global-metrics/countries/wld/world/population
Explore at:
csvAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
MACROTRENDS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1950 - Dec 31, 2025
Area covered
World, World
Description
Historical chart and dataset showing total population for the world by year from 1950 to 2025.
T
World Coronavirus COVID-19 Cases
tradingeconomics.com
csv, excel, json, xml
Updated Mar 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2020). World Coronavirus COVID-19 Cases [Dataset]. https://tradingeconomics.com/world/coronavirus-cases
Explore at:
csv, excel, xml, jsonAvailable download formats
Dataset updated
Mar 9, 2020
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 4, 2020 - May 17, 2023
Area covered
World, World
Description
The World Health Organization reported 766440796 Coronavirus Cases since the epidemic began. In addition, countries reported 6932591 Coronavirus Deaths. This dataset provides - World Coronavirus Cases- actual values, historical data, forecast, chart, statistics, economic calendar and news.
Climate Change: Earth Surface Temperature Data
kaggle.com
redivis.com
zip
Updated May 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berkeley Earth (2017). Climate Change: Earth Surface Temperature Data [Dataset]. https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
Explore at:
zip(88843537 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset authored and provided by
Berkeley Earthhttp://berkeleyearth.org/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Earth
Description
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

In this dataset, we have include several files:

Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures

LandAverageTemperature: global average land temperature in celsius

LandAverageTemperatureUncertainty: the 95% confidence interval around the average

LandMaxTemperature: global average maximum land temperature in celsius

LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature

LandMinTemperature: global average minimum land temperature in celsius

LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature

LandAndOceanAverageTemperature: global average land and ocean temperature in celsius

LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

Other files include:

Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)

Global Average Land Temperature by State (GlobalLandTemperaturesByState.csv)

Global Land Temperatures By Major City (GlobalLandTemperaturesByMajorCity.csv)

Global Land Temperatures By City (GlobalLandTemperaturesByCity.csv)

The raw data comes from the Berkeley Earth data page.
M
World Death Rate (1950-2025)
macrotrends.net
csv
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MACROTRENDS (2025). World Death Rate (1950-2025) [Dataset]. https://www.macrotrends.net/global-metrics/countries/wld/world/death-rate
Explore at:
csvAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
MACROTRENDS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1950 - Dec 31, 2025
Area covered
World, World
Description
Historical chart and dataset showing World death rate by year from 1950 to 2025.
World Development Indicators
kaggle.com
zip
Updated May 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2017). World Development Indicators [Dataset]. https://www.kaggle.com/kaggle/world-development-indicators
Explore at:
zip(387054886 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset authored and provided by
Kaggle
License
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
Description
The World Development Indicators from the World Bank contain over a thousand annual indicators of economic development from hundreds of countries around the world.

Here's a list of the available indicators along with a list of the available countries.

For example, this data includes the life expectancy at birth from many countries around the world:

The dataset hosted here is a slightly transformed verion of the raw files available here to facilitate analytics.
World Religions Across Regions
kaggle.com
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). World Religions Across Regions [Dataset]. https://www.kaggle.com/datasets/thedevastator/a-global-perspective-on-world-religions-1945-201
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Area covered
World
Description
World Religions Across Regions

Analyzing Adherence Across Regions, States and the Global System

By Correlates of War Project [source]

About this dataset

The World Religion Project (WRP) is an ambitious endeavor to conduct a comprehensive analysis of religious adherence throughout the world from 1945 to 2010. This cutting-edge project offers unparalleled insight into the religious behavior of people in different countries, regions, and continents during this time period. Its datasets provide important information about the numbers and percentages of adherents across a multitude of different religions, religion families, and non-religious affiliations.

The WRP consists of three distinct datasets: the national religion dataset, regional religion dataset, and global religion dataset. Each is focused on understanding individually specific realms for varied analysis approaches - from individual states to global systems. The national dataset provides data on number of adherents by state as well as percentage population practicing a given faith group in five-year increments; focusing attention to how this number evolves from nation to nation over time. Similarly, regional data is provided at five year intervals highlighting individual region designations with one modification – Pacific Ocean states have been reclassified into their own Oceania category according to Country Code Number 900 or above). Finally at a global level – all states are aggregated in order that we may understand a snapshot view at any five-year interval between 1945‐2010 regarding relationships between religions or religio‐families within one location or transnationally.

This project was developed in three stages: firstly forming a religions tree (a systematic classification), secondly collecting data such as this provided by WRP according to that classification structure – lastly cleaning the data so discrepancies may be reconciled and imported where needed with gaps selected when unknown values were encountered during collection process . We would encourage anyone wishing details undergoing more detailed reading/analysis relating various use applications for these rich datasets - please contact Zeev Maoz (University California Davis) & Errol A Henderson _(Pennsylvania State University)

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The World Religions Project (WRP) dataset offers a comprehensive look at religious adherence around the world within a single dataset. With this dataset, you can track global religious trends over a period of 65 years and explore how they’ve changed during that time. By exploring the WRP data set, you’ll gain insight into cross-regional and cross-time patterns in religious affiliation around the world.

Research Ideas

Analyzing historical patterns of religious growth and decline across different regions

Creating visualizations to compare religious adherence in various states, countries, or globally

Studying the impact of governmental policies on religious participation over time

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: WRP regional data.csv | Column name | Description | |:-----------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------| | Year | Reference year for data collection. (Integer) | | Region | World region according to Correlates Of War (COW) Regional Systemizations with one modification (Oceania category for COW country code ...
Statewide Death Profiles
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Statewide Death Profiles [Dataset]. https://data.chhs.ca.gov/dataset/statewide-death-profiles
Explore at:
csv(2026589), csv(200270), csv(164006), csv(5034), csv(463460), csv(4689434), csv(364098), csv(16301), csv(5401561), csv(419332), zipAvailable download formats
Dataset updated
Jun 26, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
u
Global Trust Survey - Dataset - BSOS Data Repository
bsos-data.umd.edu
Updated Aug 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Global Trust Survey - Dataset - BSOS Data Repository [Dataset]. https://bsos-data.umd.edu/dataset/global-trust-survey-dataset
Explore at:
Dataset updated
Aug 29, 2024
Description
The Global Trust dataset measures how much trust people around the world have in major institutions and social networks. It contains two data files, one with the raw survey data and one putting the raw data into percentages of trust in certain institutions. These can be analyzed in different ways. The data comes from surveys of over 119,088 people from 113 countries. Survey respondents were asked such things as “How much do you trust each of the following: other people in your neighborhood; your national government; scientists; journalists; doctors and nurses; people who work at non-governmental or non-profit organizations; healers? Do you trust them a lot, some, not much, or not at all?" The National Trust Codebook contains both the survey and the national rate codebook files, titled “Survey” and “Rate” respectively. Both files contain the same variables such as neighbors, government, and journalists, with the only difference being that “Survey” has id as a variable to account for the 119,088 unique responses. The Survey file has the raw data of all the 119,088 unique responses and both categorical and ordinal variables. It can be used to analyze how different countries feel about trust in different people or institutions as well as how those variables can relate to each other. The Rate file creates a percentage of how much people from each country trust certain communities or institutions and this can be used to analyze how different countries feel about certain things, this allows room to analyze each country with each other in a more clear way than the raw data. Both files are unique in the sense of the data being worldwide, it is a unique trait to be able to compare from different countries survey respondents that were asked the same questions with the same methodology, making comparison all the more easy. Another interesting element of this survey data is the number of responses per nation. There were, at minimum, 1000 responses gathered from each nation featured in the survey. The sample size allows for better than typical representation for each country.

‘COVID vaccination vs. mortality ’ analyzed by Analyst-2

analyst-2.ai

Updated Aug 4, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘COVID vaccination vs. mortality ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-vaccination-vs-mortality-cbd8/06c8ccd2/?iid=010-492&v=presentation

Explore at:

Dataset updated

Aug 4, 2020

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘COVID vaccination vs. mortality ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sinakaraji/covid-vaccination-vs-death on 12 November 2021.

--- Dataset description provided by original source is as follows ---

Context

The COVID-19 outbreak has brought the whole planet to its knees.More over 4.5 million people have died since the writing of this notebook, and the only acceptable way out of the disaster is to vaccinate all parts of society. Despite the fact that the benefits of vaccination have been proved to the world many times, anti-vaccine groups are springing up all over the world. This data set was generated to investigate the impact of coronavirus vaccinations on coronavirus mortality.

Content

country	iso_code	date	total_vaccinations	people_vaccinated	people_fully_vaccinated	New_deaths	population	ratio
country name	iso code for each country	date that this data belong	number of all doses of COVID vaccine usage in that country	number of people who got at least one shot of COVID vaccine	number of people who got full vaccine shots	number of daily new deaths	2021 country population	% of vaccinations in that country at that date = people_vaccinated/population * 100

Data Collection

This dataset is a combination of the following three datasets:

1.https://www.kaggle.com/gpreda/covid-world-vaccination-progress

2.https://covid19.who.int/WHO-COVID-19-global-data.csv

3.https://www.kaggle.com/rsrishav/world-population

you can find more detail about this dataset by reading this notebook:

https://www.kaggle.com/sinakaraji/simple-linear-regression-covid-vaccination

Countries in this dataset:


Afghanistan	Albania	Algeria	Andorra	Angola
Anguilla	Antigua and Barbuda	Argentina	Armenia	Aruba
Australia	Austria	Azerbaijan	Bahamas	Bahrain
Bangladesh	Barbados	Belarus	Belgium	Belize
Benin	Bermuda	Bhutan	Bolivia (Plurinational State of)	Brazil
Bosnia and Herzegovina	Botswana	Brunei Darussalam	Bulgaria	Burkina Faso
Cambodia	Cameroon	Canada	Cabo Verde	Cayman Islands
Central African Republic	Chad	Chile	China	Colombia
Comoros	Cook Islands	Costa Rica	Croatia	Cuba
Curaçao	Cyprus	Denmark	Djibouti	Dominica
Dominican Republic	Ecuador	Egypt	El Salvador	Equatorial Guinea
Estonia	Ethiopia	Falkland Islands (Malvinas)	Fiji	Finland
France	French Polynesia	Gabon	Gambia	Georgia
Germany	Ghana	Gibraltar	Greece	Greenland
Grenada	Guatemala	Guinea	Guinea-Bissau	Guyana
Haiti	Honduras	Hungary	Iceland	India
Indonesia	Iran (Islamic Republic of)	Iraq	Ireland	Isle of Man
Israel	Italy	Jamaica	Japan	Jordan
Kazakhstan	Kenya	Kiribati	Kuwait	Kyrgyzstan
Lao People's Democratic Republic	Latvia	Lebanon	Lesotho	Liberia
Libya	Liechtenstein	Lithuania	Luxembourg	Madagascar
Malawi	Malaysia	Maldives	Mali	Malta
Mauritania	Mauritius	Mexico	Republic of Moldova	Monaco
Mongolia	Montenegro	Montserrat	Morocco	Mozambique
Myanmar	Namibia	Nauru	Nepal	Netherlands
New Caledonia	New Zealand	Nicaragua	Niger	Nigeria
Niue	North Macedonia	Norway	Oman	Pakistan
occupied Palestinian territory, including east Jerusalem
Panama	Papua New Guinea	Paraguay	Peru	Philippines
Poland	Portugal	Qatar	Romania	Russian Federation
Rwanda	Saint Kitts and Nevis	Saint Lucia
Saint Vincent and the Grenadines	Samoa	San Marino	Sao Tome and Principe	Saudi Arabia
Senegal	Serbia	Seychelles	Sierra Leone	Singapore
Slovakia	Slovenia	Solomon Islands	Somalia	South Africa
Republic of Korea	South Sudan	Spain	Sri Lanka	Sudan
Suriname	Sweden	Switzerland	Syrian Arab Republic	Tajikistan
United Republic of Tanzania	Thailand	Togo	Tonga	Trinidad and Tobago
Tunisia	Turkey	Turkmenistan	Turks and Caicos Islands	Tuvalu
Uganda	Ukraine	United Arab Emirates	The United Kingdom	United States of America
Uruguay	Uzbekistan	Vanuatu	Venezuela (Bolivarian Republic of)	Viet Nam
Wallis and Futuna	Yemen	Zambia	Zimbabwe

--- Original source retains full ownership of the source dataset ---

Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
M
World Life Expectancy (1950-2025)
macrotrends.net
csv
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MACROTRENDS (2025). World Life Expectancy (1950-2025) [Dataset]. https://www.macrotrends.net/global-metrics/countries/wld/world/life-expectancy
Explore at:
csvAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
MACROTRENDS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1950 - Dec 31, 2025
Area covered
World, World
Description
Historical chart and dataset showing World life expectancy by year from 1950 to 2025.
R
F] Dataset
universe.roboflow.com
zip
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HamzaK (2025). F] Dataset [Dataset]. https://universe.roboflow.com/hamzak-s5mig/diverse-gender-detection-m-f/model/1
Explore at:
zipAvailable download formats
Dataset updated
Mar 26, 2025
Dataset authored and provided by
HamzaK
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Male Female Bounding Boxes
Description
This dataset includes images of people from all around the world. I noticed that on the internet, there was surprisingly no model that could detect genders, simply, male and female. The ones that could were not performing good and the datasets were too big and hard to download, or the models were havng API issues, atleast for me. This dataset is very carefully annotated, and you can use object detection yolo model to create a very reliable gender detection dataset, for detecting male and female.
w
Global Financial Inclusion (Global Findex) Database 2011 - Latvia
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Sep 21, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Research Group, Finance and Private Sector Development Unit (2021). Global Financial Inclusion (Global Findex) Database 2011 - Latvia [Dataset]. https://microdata.worldbank.org/index.php/catalog/1204
Explore at:
Dataset updated
Sep 21, 2021
Dataset authored and provided by
Development Research Group, Finance and Private Sector Development Unit
Time period covered
2011
Area covered
Latvia
Description
Abstract

Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.

The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.

Geographic coverage

National Coverage.

Analysis unit

Individual

Universe

The target population is the civilian, non-institutionalized population 15 years and above. The sample is nationally representative.

Kind of data

Sample survey data [ssd]

Sampling procedure

The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.

Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.

Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.

The sample size in Latvia was 1,006 individuals.

Mode of data collection

Face-to-face [f2f]

Research instrument

The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.

Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.

Sampling error estimates

Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.
w
Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan
microdata.worldbank.org
catalog.ihsn.org
+2more
Updated Apr 15, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Research Group, Finance and Private Sector Development Unit (2015). Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan [Dataset]. https://microdata.worldbank.org/index.php/catalog/1117
Explore at:
Dataset updated
Apr 15, 2015
Dataset authored and provided by
Development Research Group, Finance and Private Sector Development Unit
Time period covered
2011
Area covered
Afghanistan
Description
Abstract

Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.

The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.

Geographic coverage

National Coverage.

Analysis unit

Individual

Universe

The target population is the civilian, non-institutionalized population 15 years and above.

Kind of data

Sample survey data [ssd]

Sampling procedure

The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.

Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.

Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.

The sample size in Afghanistan was 1,000 individuals. Gender-matched sampling was used during the final stage of selection.

Mode of data collection

Face-to-face [f2f]

Research instrument

The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.

Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.

Sampling error estimates

Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.
CommitBench
zenodo.org
csv, json
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maximilian Schall; Maximilian Schall; Tamara Czinczoll; Tamara Czinczoll; Gerard de Melo; Gerard de Melo (2024). CommitBench [Dataset]. http://doi.org/10.5281/zenodo.10497442
Explore at:
json, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10497442
Dataset updated
Feb 14, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maximilian Schall; Maximilian Schall; Tamara Czinczoll; Tamara Czinczoll; Gerard de Melo; Gerard de Melo
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
Dec 15, 2023
Description
Data Statement for CommitBench

- Dataset Title: CommitBench

- Dataset Curator: Maximilian Schall, Tamara Czinczoll, Gerard de Melo

- Dataset Version: 1.0, 15.12.2023

- Data Statement Author: Maximilian Schall, Tamara Czinczoll

- Data Statement Version: 1.0, 16.01.2023

- Code URL: https://github.com/maxscha/commitbench

EXECUTIVE SUMMARY

We provide CommitBench as an open-source, reproducible and privacy- and license-aware benchmark for commit message generation. The dataset is gathered from github repositories with licenses that permit redistribution. We provide six programming languages, Java, Python, Go, JavaScript, PHP and Ruby. The commit messages in natural language are restricted to English, as it is the working language in many software development projects. The dataset has 1,664,590 examples that were generated by using extensive quality-focused filtering techniques (e.g. excluding bot commits). Additionally, we provide a version with longer sequences for benchmarking models with more extended sequence input, as well a version with

CURATION RATIONALE

We created this dataset due to quality and legal issues with previous commit message generation datasets. Given a git diff displaying code changes between two file versions, the task is to predict the accompanying commit message describing these changes in natural language. We base our GitHub repository selection on that of a previous dataset, CodeSearchNet, but apply a large number of filtering techniques to improve the data quality and eliminate noise. Due to the original repository selection, we are also restricted to the aforementioned programming languages. It was important to us, however, to provide some number of programming languages to accommodate any changes in the task due to the degree of hardware-relatedness of a language. The dataset is provides as a large CSV file containing all samples. We provide the following fields: Diff, Commit Message, Hash, Project, Split.

DOCUMENTATION FOR SOURCE DATASETS

Repository selection based on CodeSearchNet, which can be found under https://github.com/github/CodeSearchNet

LANGUAGE VARIETIES

Since GitHub hosts software projects from all over the world, there is no single uniform variety of English used across all commit messages. This means that phrasing can be regional or subject to influences from the programmer's native language. It also means that different spelling conventions may co-exist and that different terms may used for the same concept. Any model trained on this data should take these factors into account. For the number of samples for different programming languages, see Table below:

Language Number of Samples
Java 153,119
Ruby 233,710
Go 137,998
JavaScript 373,598
Python 472,469
PHP 294,394

SPEAKER DEMOGRAPHIC

Due to the extremely diverse (geographically, but also socio-economically) backgrounds of the software development community, there is no single demographic the data comes from. Of course, this does not entail that there are no biases when it comes to the data origin. Globally, the average software developer tends to be male and has obtained higher education. Due to the anonymous nature of GitHub profiles, gender distribution information cannot be extracted.

ANNOTATOR DEMOGRAPHIC

Due to the automated generation of the dataset, no annotators were used.

SPEECH SITUATION AND CHARACTERISTICS

The public nature and often business-related creation of the data by the original GitHub users fosters a more neutral, information-focused and formal language. As it is not uncommon for developers to find the writing of commit messages tedious, there can also be commit messages representing the frustration or boredom of the commit author. While our filtering is supposed to catch these types of messages, there can be some instances still in the dataset.

PREPROCESSING AND DATA FORMATTING

See paper for all preprocessing steps. We do not provide the un-processed raw data due to privacy concerns, but it can be obtained via CodeSearchNet or requested from the authors.

CAPTURE QUALITY

While our dataset is completely reproducible at the time of writing, there are external dependencies that could restrict this. If GitHub shuts down and someone with a software project in the dataset deletes their repository, there can be instances that are non-reproducible.

LIMITATIONS

While our filters are meant to ensure a high quality for each data sample in the dataset, we cannot ensure that only low-quality examples were removed. Similarly, we cannot guarantee that our extensive filtering methods catch all low-quality examples. Some might remain in the dataset. Another limitation of our dataset is the low number of programming languages (there are many more) as well as our focus on English commit messages. There might be some people that only write commit messages in their respective languages, e.g., because the organization they work at has established this or because they do not speak English (confidently enough). Perhaps some languages' syntax better aligns with that of programming languages. These effects cannot be investigated with CommitBench.

Although we anonymize the data as far as possible, the required information for reproducibility, including the organization, project name, and project hash, makes it possible to refer back to the original authoring user account, since this information is freely available in the original repository on GitHub.

METADATA

License: Dataset under the CC BY-NC 4.0 license

DISCLOSURES AND ETHICAL REVIEW

While we put substantial effort into removing privacy-sensitive information, our solutions cannot find 100% of such cases. This means that researchers and anyone using the data need to incorporate their own safeguards to effectively reduce the amount of personal information that can be exposed.

ABOUT THIS DOCUMENT

A data statement is a characterization of a dataset that provides context to allow developers and users to better understand how experimental results might generalize, how software might be appropriately deployed, and what biases might be reflected in systems built on the software.

This data statement was written based on the template for the Data Statements Version 2 schema. The template was prepared by Angelina McMillan-Major, Emily M. Bender, and Batya Friedman and can be found at https://techpolicylab.uw.edu/data-statements/ and was updated from the community Version 1 Markdown template by Leon Dercyznski.
g
Coronavirus (Covid19) — Evolution by country and around the world (daily...
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coronavirus (Covid19) — Evolution by country and around the world (daily maj) [Dataset]. https://gimi9.com/dataset/eu_5e5da8356f44412b1755a8f6/
Explore at:
Area covered
World
Description
[Edit 12/09/2020] You will now find in the files below the last 30 days, too many people do not respect the request not to recover too often the dataset (no interest in recovering every minute while the file changes 4 or 5 times a day) If you want access to the entire history, contact me [Edit 31/03/2020] Since yesterday, I made sure to have the data of the day since the ESSC, so the data of the same day are now available and updated several times a day (about every hour) as the new figures fall all over the world. The data of the previous day is always consolidated around 2am (it is no longer 1h since the time change). If you only want to have the complete data, just don't take into account the last day (today’s date) Here I share the data that I compile with the famous coronavirus infection world map created and maintained by The Johns Hopkins University and which serve me to display ** CoronaVirus statistics worldwide and by country** They share the day’s data each night on a GitHub deposit. My tools compile this new data as soon as they are available and I share the result here. This data is used to display tables and graphs on the CoronaVirus website (Covid19) of Politologue.com https://coronavirus.politologue.com/ This data will allow you to make your own graphs and analyses if you look at the subject. I do not oblige you to do it, but if my compilation allows you to do something about it and saved you time, a link to https://coronavirus.politologue.com/ will be appreciable. Information in files (csv and json) — Number of cases — Number of deaths — Number of healing — Death rate (percentage) — Healing rate (percentage) — Infection rate (persons still infected, not deceased or cured) (percentage) — And for data by country, you will find a field “country” If you integrate the client-side json or csv on a site or application, please keep a cache on your servers without risking an unexpected load on my servers.
B
ASHRAE global database of thermal comfort field measurements
borealisdata.ca
open.library.ubc.ca
Updated Jul 21, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Parkinson; Federico Tartarini; Veronika Földváry Ličina; Toby Cheung; Hui Zhang; Richard de Dear; Peixian Li; Edward Arens; Chungyoon Chun; Stefano Schiavon; Maohui Luo; Gail Brager (2022). ASHRAE global database of thermal comfort field measurements [Dataset]. http://doi.org/10.5683/SP2/GNVEM8
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/GNVEM8
Dataset updated
Jul 21, 2022
Dataset provided by
Borealis
Authors
Thomas Parkinson; Federico Tartarini; Veronika Földváry Ličina; Toby Cheung; Hui Zhang; Richard de Dear; Peixian Li; Edward Arens; Chungyoon Chun; Stefano Schiavon; Maohui Luo; Gail Brager
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
ASHRAE
Description
AbstractRecognizing the value of open-source research databases in advancing the art and science of HVAC, in 2014 the ASHRAE Global Thermal Comfort Database II project was launched under the leadership of University of California at Berkeley’s Center for the Built Environment and The University of Sydney’s Indoor Environmental Quality (IEQ) Laboratory. The ASHRAE Global Thermal Comfort Database II (as it is known) is intended to support diverse inquiries about thermal comfort in field settings. The exercise began with a systematic collection and harmonization of raw data from the last two decades of thermal comfort field studies around the world. The final database is comprised of field studies from around the world, with contributors releasing their raw data to the project for wider dissemination to the thermal comfort research community. After the quality-assurance process, there was a total of 77,304 rows of data of paired subjective comfort votes and objective instrumental measurements of thermal comfort parameters. An additional 25,288 rows of data from the original ASHRAE RP-884 database are included. The most recent update (version 2.1) has 6,441 new rows of data, bringing the total number of entries to 109,033. The project was partially performed within the framework of the International Energy Agency Energy in Buildings and Communities programm (IEA-EBC) Annex 69 "Strategy and Practice of Adaptive Thermal Comfort in Low Energy Buildings. MethodsIn order to ensure that the quality of the database would permit end-users to conduct robust hypothesis testing, the team built the data collection methodology on specific requirements, as follows: Data needed to come from field experiments rather than climate chamber research, so that it represented research conducted in “real” buildings occupied by “real” people doing their normal day-to-day activities, rather than paid college students sitting in a controlled indoor environment of a climate chamber. Both instrumental (indoor climatic) and subjective (questionnaire) data were required, such that they were recorded in the same space at the same time The database needed to be built up from the raw data files generated by the original researchers, instead of their processed or published findings. The raw data needed to come with a supporting codebook explaining the coding conventions used by the data contributor, to allow harmonization with the standardized data formatting within the database. Data must have been published either in a peer-reviewed journal or conference paper. All datasets from individual studies were subject to a stringent quality assurance process before being assimilated into the database. The research team conducted a final validation by first comparing each raw dataset with its related publication provided by the data contributor to prevent transmission errors. Systematic quality control of each study was performed to ensure that records within the database were reasonable. Firstly, distributions of each variable were visualized to identify aberrant values. Then, cross-plots between two variables (e.g. thermal sensation and thermal comfort) were used to check for incorrectly coded data. Finally, a few rows from each study were randomly selected to verify consistency between the original dataset and the standardized database. Since the data came from multiple independent studies, every record did not necessarily include all of the thermal comfort variables. Where data were missing, that particular range of cells was filled with a null value. Usage notesThe dataset is seperated into a metadata table and a `measurements table. The metadata table has high-level information at the building level and is provided as a .csv file. The measurement table contiains all field measurements and is provided as a compressed comma-separated value (.csv.gz) file using UTF-8 character encoding. The first row contains human-readable column headers. Each row represents an individual’s questionnaire responses, and the associated instrumental measurements, thermal index values and outdoor meteorological observations where available. Full details can be found in the readme document.
w
Global Financial Inclusion (Global Findex) Database 2011 - Saudi Arabia
microdata.worldbank.org
catalog.ihsn.org
+2more
Updated Apr 15, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Research Group, Finance and Private Sector Development Unit (2015). Global Financial Inclusion (Global Findex) Database 2011 - Saudi Arabia [Dataset]. https://microdata.worldbank.org/index.php/catalog/1237
Explore at:
Dataset updated
Apr 15, 2015
Dataset authored and provided by
Development Research Group, Finance and Private Sector Development Unit
Time period covered
2011
Area covered
Saudi Arabia
Description
Abstract

Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.

The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.

Geographic coverage

National Coverage.

Analysis unit

Individual

Universe

The target population is the civilian, non-institutionalized population 15 years and above. The sample includes only Saudi Arabians and Arab expatriates. The excluded population represents approximately 20% of the total adult population.

Kind of data

Sample survey data [ssd]

Sampling procedure

The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.

Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.

Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.

The sample size in the majority of economies was 1,000 individuals.

Mode of data collection

Face-to-face [f2f]

Research instrument

The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.

Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.

Sampling error estimates

Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.
w
Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan,...
microdata.worldbank.org
catalog.ihsn.org
+2more
Updated Oct 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Research Group, Finance and Private Sector Development Unit (2023). Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan, Albania, Algeria...and 134 more [Dataset]. https://microdata.worldbank.org/index.php/catalog/1097
Explore at:
Dataset updated
Oct 26, 2023
Dataset authored and provided by
Development Research Group, Finance and Private Sector Development Unit
Time period covered
2011
Area covered
Algeria...and 134 more, Albania, Afghanistan
Description
Abstract

Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.

The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.

Geographic coverage

See Methodology document for country-specific geographic coverage details.

Analysis unit

Individual

Universe

The target population is the civilian, non-institutionalized population 15 years and above.

Kind of data

Sample survey data [ssd]

Sampling procedure

The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.

Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.

Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.

The sample size in the majority of economies was 1,000 individuals.

Mode of data collection

Face-to-face [f2f] OR Landline telephone OR Landline and cellular telephone

Research instrument

The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.

Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.

Sampling error estimates

Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2023). Education Attainment and Enrollment around the World - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/dcwb0038973

Education Attainment and Enrollment around the World - Dataset - Data Catalog Armenia

Explore at:

Dataset updated

Jul 7, 2023

Area covered

World

Description

Patterns of educational attainment vary greatly across countries, and across population groups within countries. In some countries, virtually all children complete basic education whereas in others large groups fall short. The primary purpose of this database, and the associated research program, is to document and analyze these differences using a compilation of a variety of household-based data sets: Demographic and Health Surveys (DHS); Multiple Indicator Cluster Surveys (MICS); Living Standards Measurement Study Surveys (LSMS); as well as country-specific Integrated Household Surveys (IHS) such as Socio-Economic Surveys.As shown at the website associated with this database, there are dramatic differences in attainment by wealth. When households are ranked according to their wealth status (or more precisely, a proxy based on the assets owned by members of the household) there are striking differences in the attainment patterns of children from the richest 20 percent compared to the poorest 20 percent.In Mali in 2012 only 34 percent of 15 to 19 year olds in the poorest quintile have completed grade 1 whereas 80 percent of the richest quintile have done so. In many countries, for example Pakistan, Peru and Indonesia, almost all the children from the wealthiest households have completed at least one year of schooling. In some countries, like Mali and Pakistan, wealth gaps are evident from grade 1 on, in other countries, like Peru and Indonesia, wealth gaps emerge later in the school system.The EdAttain website allows a visual exploration of gaps in attainment and enrollment within and across countries, based on the international database which spans multiple years from over 120 countries and includes indicators disaggregated by wealth, gender and urban/rural location. The database underlying that site can be downloaded from here.

Clear search

Close search

Google apps

Main menu

Language	Number of Samples
Java	153,119
Ruby	233,710
Go	137,998
JavaScript	373,598
Python	472,469
PHP	294,394

Education Attainment and Enrollment around the World - Dataset - Data...

World Population (1950-2025)

World Coronavirus COVID-19 Cases

Climate Change: Earth Surface Temperature Data

World Death Rate (1950-2025)

World Development Indicators

World Religions Across Regions

World Religions Across Regions

Analyzing Adherence Across Regions, States and the Global System

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Statewide Death Profiles

Global Trust Survey - Dataset - BSOS Data Repository

‘COVID vaccination vs. mortality ’ analyzed by Analyst-2

Context

Content

Data Collection

Countries in this dataset:

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

World Life Expectancy (1950-2025)

F] Dataset

Global Financial Inclusion (Global Findex) Database 2011 - Latvia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Sampling error estimates

Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Sampling error estimates

CommitBench

Data Statement for CommitBench

EXECUTIVE SUMMARY

CURATION RATIONALE

DOCUMENTATION FOR SOURCE DATASETS

LANGUAGE VARIETIES

SPEAKER DEMOGRAPHIC

ANNOTATOR DEMOGRAPHIC

SPEECH SITUATION AND CHARACTERISTICS

PREPROCESSING AND DATA FORMATTING

CAPTURE QUALITY

LIMITATIONS

METADATA

DISCLOSURES AND ETHICAL REVIEW

ABOUT THIS DOCUMENT

Coronavirus (Covid19) — Evolution by country and around the world (daily...

ASHRAE global database of thermal comfort field measurements

Global Financial Inclusion (Global Findex) Database 2011 - Saudi Arabia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Sampling error estimates

Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan,...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data