The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Earth Hispanic or Latino population. It includes the distribution of the Hispanic or Latino population, of Earth, by their ancestries, as identified by the Census Bureau. The dataset can be utilized to understand the origin of the Hispanic or Latino population of Earth.
Key observations
Among the Hispanic population in Earth, regardless of the race, the largest group is of Mexican origin, with a population of 588 (94.84% of the total Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Origin for Hispanic or Latino population include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Earth Population by Race & Ethnicity. You can refer the same here
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
These datasets are from Our World in Data. Their complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, hospitalizations, testing, and vaccinations as well as other variables of potential interest.
our data comes from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). We discuss how and when JHU collects and publishes this data. The cases & deaths dataset is updated daily. Note: the number of cases or deaths reported by any institution—including JHU, the WHO, the ECDC, and others—on a given day does not necessarily represent the actual number on that date. This is because of the long reporting chain that exists between a new case/death and its inclusion in statistics. This also means that negative values in cases and deaths can sometimes appear when a country corrects historical data because it had previously overestimated the number of cases/deaths. Alternatively, large changes can sometimes (although rarely) be made to a country's entire time series if JHU decides (and has access to the necessary data) to correct values retrospectively.
our data comes from the European Centre for Disease Prevention and Control (ECDC) for a select number of European countries; the government of the United Kingdom; the Department of Health & Human Services for the United States; the COVID-19 Tracker for Canada. Unfortunately, we are unable to provide data on hospitalizations for other countries: there is currently no global, aggregated database on COVID-19 hospitalization, and our team at Our World in Data does not have the capacity to build such a dataset.
this data is collected by the Our World in Data team from official reports; you can find further details in our post on COVID-19 testing, including our checklist of questions to understand testing data, information on geographical and temporal coverage, and detailed country-by-country source information. The testing dataset is updated around twice a week.
Our World in Data GitHub repository for covid-19.
All we love data, cause we love to go inside it and discover the truth that's the main inspiration I have.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Initially, the format of this dataset was .json, so I converted it to .csv for ease of data processing.
"Online articles from the 25 most popular news sites in Vietnam in July 2022, suitable for practicing Natural Language Processing in Vietnamese.
Online news outlets are an unavoidable part of our society today due to their easy access, mostly free. Their effects on the way communities think and act is becoming a concern for a multitude of groups of people, including legislators, content creators, and marketers, just to name a few. Aside from the effects, what is being written on the news should be a good reflection of people’s will, attention, and even cultural standard.
In Vietnam, even though journalists have received much criticism, especially in recent years, news outlets still receive a lot of traffic (27%) compared to other methods to receive information."
Original Data Source: Vietnamese Online News .csv dataset
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Can you tell geographical stories about the world using data science?
World countries with their corresponding continents , official english names, official french names, Dial,ITU,Languages and so on.
This data was gotten from https://old.datahub.io/
Exploration of the world countries: - Can we graphically visualize countries that speak a particular language? - We can also integrate this dataset into others to enhance our exploration. - The dataset has now been updated to include longitude and latitudes of countries in the world.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
PerCapita_CO2_Footprint_InDioceses_FULLBurhans, Molly A., Cheney, David M., Gerlt, R.. . “PerCapita_CO2_Footprint_InDioceses_FULL”. Scale not given. Version 1.0. MO and CT, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2019.MethodologyThis is the first global Carbon footprint of the Catholic population. We will continue to improve and develop these data with our research partners over the coming years. While it is helpful, it should also be viewed and used as a "beta" prototype that we and our research partners will build from and improve. The years of carbon data are (2010) and (2015 - SHOWN). The year of Catholic data is 2018. The year of population data is 2016. Care should be taken during future developments to harmonize the years used for catholic, population, and CO2 data.1. Zonal Statistics: Esri Population Data and Dioceses --> Population per dioceses, non Vatican based numbers2. Zonal Statistics: FFDAS and Dioceses and Population dataset --> Mean CO2 per Diocese3. Field Calculation: Population per Diocese and Mean CO2 per diocese --> CO2 per Capita4. Field Calculation: CO2 per Capita * Catholic Population --> Catholic Carbon FootprintAssumption: PerCapita CO2Deriving per-capita CO2 from mean CO2 in a geography assumes that people's footprint accounts for their personal lifestyle and involvement in local business and industries that are contribute CO2. Catholic CO2Assumes that Catholics and non-Catholic have similar CO2 footprints from their lifestyles.Derived from:A multiyear, global gridded fossil fuel CO2 emission data product: Evaluation and analysis of resultshttp://ffdas.rc.nau.edu/About.htmlRayner et al., JGR, 2010 - The is the first FFDAS paper describing the version 1.0 methods and results published in the Journal of Geophysical Research.Asefi et al., 2014 - This is the paper describing the methods and results of the FFDAS version 2.0 published in the Journal of Geophysical Research.Readme version 2.2 - A simple readme file to assist in using the 10 km x 10 km, hourly gridded Vulcan version 2.2 results.Liu et al., 2017 - A paper exploring the carbon cycle response to the 2015-2016 El Nino through the use of carbon cycle data assimilation with FFDAS as the boundary condition for FFCO2."S. Asefi‐Najafabady P. J. Rayner K. R. Gurney A. McRobert Y. Song K. Coltin J. Huang C. Elvidge K. BaughFirst published: 10 September 2014 https://doi.org/10.1002/2013JD021296 Cited by: 30Link to FFDAS data retrieval and visualization: http://hpcg.purdue.edu/FFDAS/index.phpAbstractHigh‐resolution, global quantification of fossil fuel CO2 emissions is emerging as a critical need in carbon cycle science and climate policy. We build upon a previously developed fossil fuel data assimilation system (FFDAS) for estimating global high‐resolution fossil fuel CO2 emissions. We have improved the underlying observationally based data sources, expanded the approach through treatment of separate emitting sectors including a new pointwise database of global power plants, and extended the results to cover a 1997 to 2010 time series at a spatial resolution of 0.1°. Long‐term trend analysis of the resulting global emissions shows subnational spatial structure in large active economies such as the United States, China, and India. These three countries, in particular, show different long‐term trends and exploration of the trends in nighttime lights, and population reveal a decoupling of population and emissions at the subnational level. Analysis of shorter‐term variations reveals the impact of the 2008–2009 global financial crisis with widespread negative emission anomalies across the U.S. and Europe. We have used a center of mass (CM) calculation as a compact metric to express the time evolution of spatial patterns in fossil fuel CO2 emissions. The global emission CM has moved toward the east and somewhat south between 1997 and 2010, driven by the increase in emissions in China and South Asia over this time period. Analysis at the level of individual countries reveals per capita CO2 emission migration in both Russia and India. The per capita emission CM holds potential as a way to succinctly analyze subnational shifts in carbon intensity over time. Uncertainties are generally lower than the previous version of FFDAS due mainly to an improved nightlight data set."Global Diocesan Boundaries:Burhans, M., Bell, J., Burhans, D., Carmichael, R., Cheney, D., Deaton, M., Emge, T. Gerlt, B., Grayson, J., Herries, J., Keegan, H., Skinner, A., Smith, M., Sousa, C., Trubetskoy, S. “Diocesean Boundaries of the Catholic Church” [Feature Layer]. Scale not given. Version 1.2. Redlands, CA, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2016.Using: ArcGIS. 10.4. Version 10.0. Redlands, CA: Environmental Systems Research Institute, Inc., 2016.Boundary ProvenanceStatistics and Leadership DataCheney, D.M. “Catholic Hierarchy of the World” [Database]. Date Updated: August 2019. Catholic Hierarchy. Using: Paradox. Retrieved from Original Source.Catholic HierarchyAnnuario Pontificio per l’Anno .. Città del Vaticano :Tipografia Poliglotta Vaticana, Multiple Years.The data for these maps was extracted from the gold standard of Church data, the Annuario Pontificio, published yearly by the Vatican. The collection and data development of the Vatican Statistics Office are unknown. GoodLands is not responsible for errors within this data. We encourage people to document and report errant information to us at data@good-lands.org or directly to the Vatican.Additional information about regular changes in bishops and sees comes from a variety of public diocesan and news announcements.GoodLands’ polygon data layers, version 2.0 for global ecclesiastical boundaries of the Roman Catholic Church:Although care has been taken to ensure the accuracy, completeness and reliability of the information provided, due to this being the first developed dataset of global ecclesiastical boundaries curated from many sources it may have a higher margin of error than established geopolitical administrative boundary maps. Boundaries need to be verified with appropriate Ecclesiastical Leadership. The current information is subject to change without notice. No parties involved with the creation of this data are liable for indirect, special or incidental damage resulting from, arising out of or in connection with the use of the information. We referenced 1960 sources to build our global datasets of ecclesiastical jurisdictions. Often, they were isolated images of dioceses, historical documents and information about parishes that were cross checked. These sources can be viewed here:https://docs.google.com/spreadsheets/d/11ANlH1S_aYJOyz4TtG0HHgz0OLxnOvXLHMt4FVOS85Q/edit#gid=0To learn more or contact us please visit: https://good-lands.org/Esri Gridded Population Data 2016DescriptionThis layer is a global estimate of human population for 2016. Esri created this estimate by modeling a footprint of where people live as a dasymetric settlement likelihood surface, and then assigned 2016 population estimates stored on polygons of the finest level of geography available onto the settlement surface. Where people live means where their homes are, as in where people sleep most of the time, and this is opposed to where they work. Another way to think of this estimate is a night-time estimate, as opposed to a day-time estimate.Knowledge of population distribution helps us understand how humans affect the natural world and how natural events such as storms and earthquakes, and other phenomena affect humans. This layer represents the footprint of where people live, and how many people live there.Dataset SummaryEach cell in this layer has an integer value with the estimated number of people likely to live in the geographic region represented by that cell. Esri additionally produced several additional layers World Population Estimate Confidence 2016: the confidence level (1-5) per cell for the probability of people being located and estimated correctly. World Population Density Estimate 2016: this layer is represented as population density in units of persons per square kilometer.World Settlement Score 2016: the dasymetric likelihood surface used to create this layer by apportioning population from census polygons to the settlement score raster.To use this layer in analysis, there are several properties or geoprocessing environment settings that should be used:Coordinate system: WGS_1984. This service and its underlying data are WGS_1984. We do this because projecting population count data actually will change the populations due to resampling and either collapsing or splitting cells to fit into another coordinate system. Cell Size: 0.0013474728 degrees (approximately 150-meters) at the equator. No Data: -1Bit Depth: 32-bit signedThis layer has query, identify, pixel, and export image functions enabled, and is restricted to a maximum analysis size of 30,000 x 30,000 pixels - an area about the size of Africa.Frye, C. et al., (2018). Using Classified and Unclassified Land Cover Data to Estimate the Footprint of Human Settlement. Data Science Journal. 17, p.20. DOI: http://doi.org/10.5334/dsj-2018-020.What can you do with this layer?This layer is unsuitable for mapping or cartographic use, and thus it does not include a convenient legend. Instead, this layer is useful for analysis, particularly for estimating counts of people living within watersheds, coastal areas, and other areas that do not have standard boundaries. Esri recommends using the Zonal Statistics tool or the Zonal Statistics to Table tool where you provide input zones as either polygons, or raster data, and the tool will summarize the count of population within those zones. https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/data-management/2016-world-population-estimate-services-are-now-available/
THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JUNE 7
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
OpenAQ is an open-source project to surface live, real-time air quality data from around the world. Their “mission is to enable previously impossible science, impact policy and empower the public to fight air pollution.” The data includes air quality measurements from 5490 locations in 47 countries.
Scientists, researchers, developers, and citizens can use this data to understand the quality of air near them currently. The dataset only includes the most current measurement available for the location (no historical data).
Update Frequency: Weekly
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.openaq.[TABLENAME]
. Fork this kernel to get started.
Dataset Source: openaq.org
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source and is provided "AS IS" without any warranty, express or implied.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GDP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 monthly mean data on single levels from 1940 to present".
By Olga Tsubiks [source]
This dataset contains survey responses from the tech industry about mental health, offering an insightful snapshot into the diagnoses, treatments, and attitudes of those in the field towards mental health. These data points allow people to understand more about how their peers in tech view mental health and can provide greater insight into how to better support those who work in this industry. This dataset includes questions on whether or not respondents have had a mental health disorder or sought treatment for a mental health issue in the past, if they currently have been diagnosed with a condition and what it is, their age group, location of work and residence as well as information on whether they are self-employed or working at a tech company with other questions. Additionally, this dataset also provides insight into respondents' attitudes towards speaking openly about their mental wellbeing versus physical wellbeing. To gain even more understanding of individual's experiences within their place of business overall employee count is included as well what role they fill within that organisation is related to technology/IT. This valuable data set may be used for medical research furthering our knowledge about workplace stressors effecting people seen within this particular field but also across multiple industries to help create support systems that reflect upon individual need rather than one-size fits all models previously employed by employers through out many parts globally
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
- Analyze the correlation between employment industry and mental health status, including self-identified diagnosis, use of mental health services and any history of mental illness in the family.
- Determine if there are differences in how people experience and speak out about their own mental health based on geographic location.
- Compare attitudes towards open conversations on physical vs mental health within different age groups both in the U.S. and abroad
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: OSMI_Survey_Data.csv | Column name | Description | |:-----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------| | Are you selfemployed | Indicates whether the respondent is self-employed or not. (Boolean) | | How many employees does your company or organization have | Indicates the number of employees in the respondent's company or organization. (Numeric) | | Is your employer primarily a tech companyorganization | Indicates whether the respondent's employer is primarily a tech company or organization. (Boolean) | | Is your primary role within your company related to techIT | Indicates whether the respondent's primary role within their company is related to tech or IT. (Boolean) | | Do you have previous employers | Indicates whether the respondent has had previous employers. (Boolean) ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recommended citation
Gütschow, J.; Busch, D.; Pflüger, M. (2024): The PRIMAP-hist national historical emissions time series v2.6.1 (1750-2023). zenodo. doi:10.5281/zenodo.15016289.
Gütschow, J.; Jeffery, L.; Gieseke, R.; Gebel, R.; Stevens, D.; Krapp, M.; Rocha, M. (2016): The PRIMAP-hist national historical emissions time series, Earth Syst. Sci. Data, 8, 571-603, doi:10.5194/essd-8-571-2016
Content
Use of the dataset and full description
Abstract
Support
Sources
Files included in the dataset
Notes
Data format description (columns)
References
Changelog
Abstract
The PRIMAP-hist dataset combines several published datasets to create a comprehensive set of greenhouse gas emission pathways for every country and Kyoto gas, covering the years 1750 to 2023, and almost all UNFCCC (United Nations Framework Convention on Climate Change) member states as well as most non-UNFCCC territories. The data resolves the main IPCC (Intergovernmental Panel on Climate Change) 2006 categories. For CO2, CH4, and N2O subsector data for Energy, Industrial Processes and Product Use (IPPU), and Agriculture are available. The "country reported data priority" (CR) scenario of the PRIMAP-hist datset prioritizes data that individual countries report to the UNFCCC.
For developed countries, AnnexI in terms of the UNFCCC, this is the data submitted anually in the "National Inventory Submissions". Until 2023 data was submitted in the "Common Reporting Format" (CRF). Since 2024 the new "Common Reporting Tables" (CRT) are used. For developing countries, non-AnnexI in terms of the UNFCCC, we use the "Biannial Transparency Reports" (BTR) which mostly come with data also using the "Common Reporting Tables". We also use older data available through the UNFCCC DI portal (di.unfccc.int) and additional country submissions from "Biannial Update Reports" (BUR), "National Communications" (NC), and "National Inventory Reports" (NIR) read from pdf and where available xls(x) or csv files. For a list of these submissions please see below. For South Korea the 2023 official GHG inventory has not yet been submitted to the UNFCCC but is included in PRIMAP-hist. PRIMAP-hist also includes official data for Taiwan which is not recognized as a party to the UNFCCC. We have mostly replaced the official data that has not been submitted to the UNFCCC used in v2.6 as countries have now submitted their data in CRT format, but had to make some exceptions as the CRT data was not usable for all countries.
Gaps in the country reported data are filled using third party data such as CDIAC, EI (fossil CO2), Andrew cement emissions data (cement), FAOSTAT (agriculture), and EDGAR 2024 (all sectors for CO2, CH4, N2O, HFCs, PFCs, SF6, NF3, except energy CO2). Lower priority data are harmonized to higher priority data in the gap-filling process.
For the third party priority time series gaps in the third party data are filled from country reported data sources.
Data for earlier years which are not available in the above mentioned sources are sourced from EDGAR-HYDE, CEDS, and RCP (N2O only) historical emissions.
The v2.4 release of PRIMAP-hist reduced the time-lag from 2 to 1 years for the October release. Thus the present version 2.6.1 includes data for 2023. For energy CO2 growth rates from the EI Statistical Review of World Energy are used to extend the country reported (CR) or CDIAC (TP) data to 2023. For CO2 from cement production Andrew cement data are used. For other gases and sectors we use EDGAR 2024 data. In a few cases we have to rely on numerical methods to estimate emissions for 2023.
Version 2.6.1 of the PRIMAP-hist dataset does not include emissions from Land Use, Land-Use Change, and Forestry (LULUCF) in the main file. LULUCF data are included in the file with increased number of significant digits and have to be used with care as they are constructed from different sources using different methodologies and are not harmonized.
The PRIMAP-hist v2.6.1 dataset is an updated version of
Gütschow, J.; Pflüger, M.; Busch, D. (2024): The PRIMAP-hist national historical emissions time series v2.6 (1750-2023). zenodo. doi:10.5281/zenodo.13752654.
The Changelog indicates the most important changes. You can also check the issue tracker on github.com/JGuetschow/PRIMAP-hist for additional information on issues found after the release of the dataset. Detailed per country information is available from the detailed changelog which is available on the primap.org website and on zenodo.
Use of the dataset and full description
Before using the dataset, please read this document and the article describing the methodology, especially the section on uncertainties and the section on limitations of the method and use of the dataset.
Gütschow, J.; Jeffery, L.; Gieseke, R.; Gebel, R.; Stevens, D.; Krapp, M.; Rocha, M. (2016): The PRIMAP-hist national historical emissions time series, Earth Syst. Sci. Data, 8, 571-603, doi:10.5194/essd-8-571-2016
Please notify us (johannes.guetschow@climate-resource.com) if you use the dataset so that we can keep track of how it is used and take that into consideration when updating and improving the dataset.
When using this dataset or one of its updates, please cite the DOI of the precise version of the dataset used and also the data description article which this dataset is supplement to (see above). Please consider also citing the relevant original sources when using the PRIMAP-hist dataset. See the full citations in the References section further below.
Since version 2.3 we use the data formats developed for the PRIMAP2 climate policy analysis suite: PRIMAP2 on GitHub. The data are published both in the interchange format which consists of a csv file with the data and a yaml file with additional metadata and the native NetCDF based format. For a detailed description of the data format we refer to the PRIMAP2 documentation.
We have also included files with more than three significant digits. These files are mainly aimed at people doing policy analysis using the country reported data scenario (HISTCR). Using the high precision data they can avoid questions on discrepancies with the reported data. The uncertainties of emissions data do not justify the additional significant digits and they might give a false sense of accuracy, so please use this version of the dataset with extra care.
Support
If you encounter possible errors or other things that should be noted, please check our issue tracker at github.com/JGuetschow/PRIMAP-hist and report your findings there. Please use the tag "v2.6.1" in any issue you create regarding this dataset.
If you need support in using the dataset or have any other questions regarding the dataset, please contact johannes.guetschow@climate-resource.com.
Climate Resource makes this data available CC BY 4.0 licence. Free support is limited to simple questions and non-commercial users. We also provide additional data, and data support services to clients wanting more frequent updates, additional metadata or to integrate these datasets into their workflows. Get in touch at contact@climate-resource.com if you are interested.
Sources
Global CO2 emissions from cement production v250226 data, paper: Andrew(2025), Andrew (2019)
EI Statistical Review of World Energy website: Energy Institute (2024)
CDIAC data: Hefner and Marland (2023), data: Hefner (2024), paper: Gilfillan and Marland (2021)
CEDS: data: Hoesly et al. (2020), paper: Hoesly et al. (2018)
EDGAR 2024: data/website: European Commission, European Commision, JRC (2024), report: European Commission. Joint Research Centre & IEA. (2024)
EDGAR-HYDE 1.4 data: Van Aardenne et al. (2001), Olivier and Berdowski (2001)
FAOSTAT database data: Food and Agriculture Organization of the United Nations (2024)
RCP historical data data, paper: Meinshausen et al. (2011)
UNFCCC National Communications and National Inventory Reports for developing countries available from the UNFCCC DI portal website, data: UNFCCC (2024e), Pflüger and Gütschow (2024), github
UNFCCC Bnnial Update Reports, National Communications, and National Inventory Reports for developing countries website-BURs, website-NCs, data: UNFCCC (2024d), UNFCCC (2024b).
Notes:
Not all BUR and NC submissions are included as reading the data is time consuming and not all submission contain sufficient data to be used in PRIMAP-hist.
Not all submissions included in PRIMAP-hist are available in the github repository as we do not (yet) have code that we can publish for all submissions.
No submissions have been added for PRIMAP-hist v2.6.1
UNFCCC First Biannial Transparency Reports website, [data] UNFCCC (2025)
Notes:
For a list of added submissions see section "Data source updates (v2.6.1)" in the changelog in the pdf data description.
UNFCCC Common Reporting Format (CRF) website, paper, data (24-01-08): UNFCCC (2024c) (processed as described in Jeffery et al. (2018))
Official country repositories (non-UNFCCC)
Belarus: Greenhouse gas statistics (1990-2022) website: National Statistical Committee of theRepublic of Belarus (2024)
EU, Iceland, Norway, Switzerland: National emissions reported to the UNFCCC and to the EU Greenhouse Gas Monitoring Mechanism, April 2024 website: European Environment Agency(2024)
South Korea: 2023 Inventory website, data: Republic of Korea (2023)
Taiwan / Republic of China: 2023 Inventory website, data: Republic of China - EnvironmentalProtection Administration (2023)
For the pre-1990 LULUCF time-series we use the following additional data sources:
Houghton land use CO2 website: Houghton (2008)
HYDE land cover data website: Klein Goldewijk et al. (2010), Klein Goldewijk et al. (2011)
SAGE Global Potential Vegetation Dataset website: Ramankutty and Foley (1999)
FAO Country Boundaries website: Food and Agriculture Organization of the United Nations(2015)
Files included in the dataset
For each dataset we have three files:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset:
Nirmalya Thakur, "MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak", Journal of Data (Paper Submitted).
The preprint of this paper is available at: https://www.preprints.org/manuscript/202206.0172/v1
Abstract
The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization is considering whether the outbreak should be assessed as a “potential public health emergency of international concern” or PHEIC, as was done for the COVID-19 and Ebola outbreaks in the past. During this time, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.
Data Description
The dataset consists of a total of 68,934 tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 11th June 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 4 different .txt files based on the timelines of the associated tweets. The following table provides the details of these dataset files.
Filename |
No. of Tweet IDs |
Date Range of the Tweet IDs |
TweetIDs_Part1.txt |
19718 |
June 11, 2022 to June 5, 2022 |
TweetIDs_Part2.txt |
17585 |
June 5, 2022 to May 27, 2022 |
TweetIDs_Part3.txt |
17705 |
May 27, 2022 to May 21, 2022 |
TweetIDs_Part4.txt |
13926 |
May 21, 2022 to May 7, 2022 |
The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Roads, railways and utility easements are integral components of human society, allowing for the safe and efficient transport of people and goods. There are few places on earth that are not currently traversed or impacted by the vast networks of linear infrastructure. The ecological impacts of linear infrastructure and vehicles are numerous, diverse and, in most cases, deleterious. Recognition and amelioration of these impacts is becoming widespread around the world, and new roads and other linear infrastructure are increasingly planned to avoid high-quality areas and designed to minimise or mitigate the deleterious effects. Importantly, the negative effects of the existing infrastructure are also being reduced during routine maintenance and upgrade projects, as well as targeted retrofits to fix specific problem areas. (1) Global road length, number of vehicles and rate of per capita travel are high and predicted to increase significantly over the next few decades.(2) The ‘road-effect zone’ is a useful conceptual framework to quantify the negative ecological and environmental impacts of roads and traffic.(3) The effects of roads and traffic on wildlife are numerous, varied and typically deleterious. (4) The density and configuration of road networks are important considerations in road planning. (5) The costs to society of wildlife-vehicle collisions can be high. (6) The strategies of avoidance, minimisation, mitigation and offsetting are increasingly being adopted around the world – but it must be recognised that some impacts are unavoidable and unmitigable. (7) Road ecology is an applied science which underpins the quantification and mitigation of road impacts. The global rates of road construction and private vehicle ownership as well as travel demand will continue to rise for the foreseeable future, including at a rapid rate in many developing countries. The challenge currently facing society is to build a more efficient transportation system that facilitates economic growth and development, reduces environmental impacts and protects biodiversity and ecosystem functions. The legacy of the decisions we make today and the roads and railways we construct tomorrow will be with us for many years to come.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GOLD RESERVES reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Extreme poverty’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathurinache/extreme-poverty on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Two centuries ago the majority of the world population was extremely poor. Back then it was widely believed that widespread poverty was inevitable. But this turned out to be wrong. Economic growth is possible and poverty can decline. The world has made immense progress against extreme poverty.
But even after two centuries of progress, extreme poverty is still the reality for every tenth person in the world. This is what the ‘international poverty line’ highlights – this metric plays an important (and successful) role in focusing the world’s attention on these very poorest people in the world.
The poorest people today live in countries which have achieved no growth. This stagnation of the world’s poorest economies is one of the largest problems of our time. Unless this changes millions of people will continue to live in extreme poverty.
Data comes from https://ourworldindata.org/extreme-poverty-in-brief Thanks to them to aggregate this kind of informations!
https://media.globalcitizen.org/thumbnails/90/19/90190c20-1182-47d6-a86e-3a2dcc912e73/extreme-poverty-un-explainer-social-share.jpg_1500x670_q85_ALIAS-hero_image_crop_subsampling-2.jpg" alt="Extreme Poverty">
Compare country, by year the % of persons in extreme poverty
--- Original source retains full ownership of the source dataset ---
The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.