Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical chart and dataset showing World population growth rate by year from 1961 to 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Heliocentric trajectories for Voyager 2 in Heliographic, HG, Heliographic Inertial, HGI, and Solar Ecliptic, SE, Coordinates The original trajectory data are taken from http://ssd.jpl.nasa.gov/horizons.cgi where users can find many more objects. In the case of orbit data for planets, the orbit data can be used as a proxy for spacecraft ephemeris that are in orbit about the planets. On a heliospheric scale, differences between the planet orbital tarjectory and that of the spacecraft are very small. For instance, the heliocentric longitudes differ by only 0.25° for a spacecraft stationed near the L1 Lagrange point at approximately 100 Earth radii upstream of the Earth. The production of the HG, HGI, and SE trajectory data requires a values for the "Equinox Epoch", which is defined as the epoch time when the direction from the Earth to the sun at the time of the vernal equinox when the sun seems to cross equatorial plane of the Earth from below. This direction is called the First Point of Aries, FPA and it is not a fixed direction but drifts by about 1.4° per century or 50.26" per year. In addition, there are tiny irregularities in FPA drift that are on the order of 1" per year or less. The Equinox Epoch can be determined by using a variety of methods for calculating the instantaneous FPA longitudinal direction and whether the tiny irregularities have been smoothed or averaged out. Four methods for determining the Equinox Epoch are in common usage: +---------------------------------------------------------------------+ Method Name FPA Longitude Definition --------------------------------------------------------------------- B1950.0 the actual FPA at 22:09 UT on December 31, 1949 J2000.0 the smoothed FPA at 12:00 UT on January 1, 2000 True of Date the actual FPA at 00:00 UT on the date of interest Mean of Date the smoothed FPA at 00:00 UT on the date of interest +---------------------------------------------------------------------+ The heliocentric trajectory data included in this data product have been calculated by using the Equinox Epoch: defined via the "Mean of Date" method. More precise coordinates, and some planet-centered coordinates, are found in the "traj" subdirectories of spacecraft specific directories at https://spdf.gsfc.nasa.gov/pub/data/ and http://ssd.jpl.nasa.gov/horizons.cgi.
The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.
What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!
SELECT
age.country_name,
age.life_expectancy,
size.country_area
FROM (
SELECT
country_name,
life_expectancy
FROM
bigquery-public-data.census_bureau_international.mortality_life_expectancy
WHERE
year = 2016) age
INNER JOIN (
SELECT
country_name,
country_area
FROM
bigquery-public-data.census_bureau_international.country_names_area
where country_area > 25000) size
ON
age.country_name = size.country_name
ORDER BY
2 DESC
/* Limit removed for Data Studio Visualization */
LIMIT
10
Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.
SELECT
age.country_name,
SUM(age.population) AS under_25,
pop.midyear_population AS total,
ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25
FROM (
SELECT
country_name,
population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population_agespecific
WHERE
year =2017
AND age < 25) age
INNER JOIN (
SELECT
midyear_population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population
WHERE
year = 2017) pop
ON
age.country_code = pop.country_code
GROUP BY
1,
3
ORDER BY
4 DESC /* Remove limit for visualization*/
LIMIT
10
The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.
SELECT
growth.country_name,
growth.net_migration,
CAST(area.country_area AS INT64) AS country_area
FROM (
SELECT
country_name,
net_migration,
country_code
FROM
bigquery-public-data.census_bureau_international.birth_death_growth_rates
WHERE
year = 2017) growth
INNER JOIN (
SELECT
country_area,
country_code
FROM
bigquery-public-data.census_bureau_international.country_names_area
Historic (none)
United States Census Bureau
Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
<ul style='margin-top:20px;'>
<li>World birth rate for 2024 was <strong>17.30</strong>, a <strong>5.9% increase</strong> from 2023.</li>
<li>World birth rate for 2023 was <strong>16.33</strong>, a <strong>1.34% decline</strong> from 2022.</li>
<li>World birth rate for 2022 was <strong>16.56</strong>, a <strong>1.7% decline</strong> from 2021.</li>
</ul>Crude birth rate indicates the number of live births occurring during the year, per 1,000 population estimated at midyear. Subtracting the crude death rate from the crude birth rate provides the rate of natural increase, which is equal to the rate of population change in the absence of migration.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Forest Proximate People" (FPP) dataset is one of the data layers contributing to the development of indicator #13, “number of forest-dependent people in extreme poverty,” of the Collaborative Partnership on Forests (CPF) Global Core Set of forest-related indicators (GCS). The FPP dataset provides an estimate of the number of people living in or within 5 kilometers of forests (forest-proximate people) for the year 2019 with a spatial resolution of 100 meters at a global level.
For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L. Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: A new methodology and global estimates. Background Paper to The State of the World’s Forests 2022 report. Rome, FAO.
Contact points:
Maintainer: Leticia Pina
Maintainer: Sarah E., Castle
Data lineage:
The FPP data are generated using Google Earth Engine. Forests are defined by the Copernicus Global Land Cover (CGLC) (Buchhorn et al. 2020) classification system’s definition of forests: tree cover ranging from 15-100%, with or without understory of shrubs and grassland, and including both open and closed forests. Any area classified as forest sized ≥ 1 ha in 2019 was included in this definition. Population density was defined by the WorldPop global population data for 2019 (WorldPop 2018). High density urban populations were excluded from the analysis. High density urban areas were defined as any contiguous area with a total population (using 2019 WorldPop data for population) of at least 50,000 people and comprised of pixels all of which met at least one of two criteria: either the pixel a) had at least 1,500 people per square km, or b) was classified as “built-up” land use by the CGLC dataset (where “built-up” was defined as land covered by buildings and other manmade structures) (Dijkstra et al. 2020). Using these datasets, any rural people living in or within 5 kilometers of forests in 2019 were classified as forest proximate people. Euclidean distance was used as the measure to create a 5-kilometer buffer zone around each forest cover pixel. The scripts for generating the forest-proximate people and the rural-urban datasets using different parameters or for different years are published and available to users. For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L., Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: a new methodology and global estimates. Background Paper to The State of the World’s Forests 2022. Rome, FAO.
References:
Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.E., Herold, M., Fritz, S., 2020. Copernicus Global Land Service: Land Cover 100m: collection 3 epoch 2019. Globe.
Dijkstra, L., Florczyk, A.J., Freire, S., Kemper, T., Melchiorri, M., Pesaresi, M. and Schiavina, M., 2020. Applying the degree of urbanisation to the globe: A new harmonised definition reveals a different picture of global urbanisation. Journal of Urban Economics, p.103312.
WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University, 2018. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645
Online resources:
GEE asset for "Forest proximate people - 5km cutoff distance"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Non-Hispanic population of White Earth by race. It includes the distribution of the Non-Hispanic population of White Earth across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of White Earth across relevant racial categories.
Key observations
With a zero Hispanic population, White Earth is 100% Non-Hispanic. Among the Non-Hispanic population, the largest racial group is White alone with a population of 76 (100% of the total Non-Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for White Earth Population by Race & Ethnicity. You can refer the same here
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The world's population has undergone remarkable growth, exceeding 7.5 billion by mid-2019 and continuing to surge beyond previous estimates. Notably, China and India stand as the two most populous countries, with China's population potentially facing a decline while India's trajectory hints at surpassing it by 2030. This significant demographic shift is just one facet of a global landscape where countries like the United States, Indonesia, Brazil, Nigeria, and others, each with populations surpassing 100 million, play pivotal roles.
The steady decrease in growth rates, though, is reshaping projections. While the world's population is expected to exceed 8 billion by 2030, growth will notably decelerate compared to previous decades. Specific countries like India, Nigeria, and several African nations will notably contribute to this growth, potentially doubling their populations before rates plateau.
This dataset provides comprehensive historical population data for countries and territories globally, offering insights into various parameters such as area size, continent, population growth rates, rankings, and world population percentages. Spanning from 1970 to 2023, it includes population figures for different years, enabling a detailed examination of demographic trends and changes over time.
Structured with meticulous detail, this dataset offers a wide array of information in a format conducive to analysis and exploration. Featuring parameters like population by year, country rankings, geographical details, and growth rates, it serves as a valuable resource for researchers, policymakers, and analysts. Additionally, the inclusion of growth rates and world population percentages provides a nuanced understanding of how countries contribute to global demographic shifts.
This dataset is invaluable for those interested in understanding historical population trends, predicting future demographic patterns, and conducting in-depth analyses to inform policies across various sectors such as economics, urban planning, public health, and more.
This dataset (world_population_data.csv
) covering from 1970 up to 2023 includes the following columns:
Column Name | Description |
---|---|
Rank | Rank by Population |
CCA3 | 3 Digit Country/Territories Code |
Country | Name of the Country |
Continent | Name of the Continent |
2023 Population | Population of the Country in the year 2023 |
2022 Population | Population of the Country in the year 2022 |
2020 Population | Population of the Country in the year 2020 |
2015 Population | Population of the Country in the year 2015 |
2010 Population | Population of the Country in the year 2010 |
2000 Population | Population of the Country in the year 2000 |
1990 Population | Population of the Country in the year 1990 |
1980 Population | Population of the Country in the year 1980 |
1970 Population | Population of the Country in the year 1970 |
Area (km²) | Area size of the Country/Territories in square kilometer |
Density (km²) | Population Density per square kilometer |
Growth Rate | Population Growth Rate by Country |
World Population Percentage | The population percentage by each Country |
The primary dataset was retrieved from the World Population Review. I sincerely thank the team for providing the core data used in this dataset.
© Image credit: Freepik
TROPIS is the acronym for the Tree Growth and Permanent Plot Information System sponsored by CIFOR to promote more effective use of existing data and knowledge about tree growth.
TROPIS is concerned primarily with information about permanent plots and tree
growth in both planted and natural forests throughout the world. It has five
components:
- a network of people willing to share permanent plot data and tree
growth information;
- an index to people and institutions with permanent plots;
- a database management system to promote more efficient data management;
- a method to find comparable sites elsewhere, so that observations
can be supplemented or contrasted with other data; and
- an inference system to allow growth estimates to be made in the
absence of empirical data.
- TROPIS is about people and information. The core of TROPIS is an
index to people and their plots maintained in a relational
database. The database is designed to fulfil two primary needs:
- to provide for efficient cross-checking, error-checking and
updating; and to facilitate searches for plots matching a wide range
of specified criteria, including (but not limited to) location, forest
type, taxa, plot area, measurement history.
The database is essentially hierarchical: the key element of the
database is the informant. Each informant may contribute information
on many plot series, each of which has consistent objectives. In turn,
each series may comprise many plots, each of which may have a
different location or different size. Each plot may contain many
species. A series may be a thinning or spacing experiment, some
species or provenance trials, a continuous forest inventory system, or
any other aggregation of plots convenient to the informant. Plots need
not be current. Abandoned plots may be included provided that the
location is known and the plot data remain accessible. In addition to
details of the informant, we try to record details of additional
contact people associated with plots, to maintain continuity when
people transfer or retire. Thus the relational structure may appear
complex, but ensures data integrity.
At present, searches are possible only via mail, fax or email requests
to the TROPIS co-ordinator at CIFOR. Self-service on-line searching
will also be available in 1997. Clients may search for plots with
specified taxa, locations, silvicultural treatment, or other specified
criteria and combinations. TROPIS currently contains references to
over 10,000 plots with over 2,000 species contributed by 100
individuals world-wide.
This database will help CIFOR as well as other users to make more
efficient use of existing information, and to develop appropriate and
effective techniques and policies for sustainable forest management
world-wide.
TROPIS is supported by the Government of Japan.
This information is from the CIFOR web site.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical chart and dataset showing total population for the world by year from 1950 to 2025.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Heliocentric trajectories for Mars Science Laboratory in Heliographic, HG, Heliographic Inertial, HGI, and Solar Ecliptic, SE, Coordinates The original trajectory data are taken from http://ssd.jpl.nasa.gov/horizons.cgi where users can find many more objects. In the case of orbit data for planets, the orbit data can be used as a proxy for spacecraft ephemeris that are in orbit about the planets. On a heliospheric scale, differences between the planet orbital tarjectory and that of the spacecraft are very small. For instance, the heliocentric longitudes differ by only 0.25° for a spacecraft stationed near the L1 Lagrange point at approximately 100 Earth radii upstream of the Earth. The production of the HG, HGI, and SE trajectory data requires a values for the "Equinox Epoch", which is defined as the epoch time when the direction from the Earth to the sun at the time of the vernal equinox when the sun seems to cross equatorial plane of the Earth from below. This direction is called the First Point of Aries, FPA and it is not a fixed direction but drifts by about 1.4° per century or 50.26" per year. In addition, there are tiny irregularities in FPA drift that are on the order of 1" per year or less. The Equinox Epoch can be determined by using a variety of methods for calculating the instantaneous FPA longitudinal direction and whether the tiny irregularities have been smoothed or averaged out. Four methods for determining the Equinox Epoch are in common usage: +---------------------------------------------------------------------+ Method Name FPA Longitude Definition --------------------------------------------------------------------- B1950.0 the actual FPA at 22:09 UT on December 31, 1949 J2000.0 the smoothed FPA at 12:00 UT on January 1, 2000 True of Date the actual FPA at 00:00 UT on the date of interest Mean of Date the smoothed FPA at 00:00 UT on the date of interest +---------------------------------------------------------------------+ The heliocentric trajectory data included in this data product have been calculated by using the Equinox Epoch: defined via the "Mean of Date" method. More precise coordinates, and some planet-centered coordinates, are found in the "traj" subdirectories of spacecraft specific directories at https://spdf.gsfc.nasa.gov/pub/data/ and http://ssd.jpl.nasa.gov/horizons.cgi.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Dataset Card for 100 Richest People In World
Dataset Summary
This dataset contains the list of Top 100 Richest People in the World Column Information:-
Name - Person Name NetWorth - His/Her Networth Age - Person Age Country - The country person belongs to Source - Information Source Industry - Expertise Domain
Join our Community
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/100-richest-people-in-world.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID vaccination vs. mortality ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sinakaraji/covid-vaccination-vs-death on 12 November 2021.
--- Dataset description provided by original source is as follows ---
The COVID-19 outbreak has brought the whole planet to its knees.More over 4.5 million people have died since the writing of this notebook, and the only acceptable way out of the disaster is to vaccinate all parts of society. Despite the fact that the benefits of vaccination have been proved to the world many times, anti-vaccine groups are springing up all over the world. This data set was generated to investigate the impact of coronavirus vaccinations on coronavirus mortality.
country | iso_code | date | total_vaccinations | people_vaccinated | people_fully_vaccinated | New_deaths | population | ratio |
---|---|---|---|---|---|---|---|---|
country name | iso code for each country | date that this data belong | number of all doses of COVID vaccine usage in that country | number of people who got at least one shot of COVID vaccine | number of people who got full vaccine shots | number of daily new deaths | 2021 country population | % of vaccinations in that country at that date = people_vaccinated/population * 100 |
This dataset is a combination of the following three datasets:
1.https://www.kaggle.com/gpreda/covid-world-vaccination-progress
2.https://covid19.who.int/WHO-COVID-19-global-data.csv
3.https://www.kaggle.com/rsrishav/world-population
you can find more detail about this dataset by reading this notebook:
https://www.kaggle.com/sinakaraji/simple-linear-regression-covid-vaccination
Afghanistan | Albania | Algeria | Andorra | Angola |
Anguilla | Antigua and Barbuda | Argentina | Armenia | Aruba |
Australia | Austria | Azerbaijan | Bahamas | Bahrain |
Bangladesh | Barbados | Belarus | Belgium | Belize |
Benin | Bermuda | Bhutan | Bolivia (Plurinational State of) | Brazil |
Bosnia and Herzegovina | Botswana | Brunei Darussalam | Bulgaria | Burkina Faso |
Cambodia | Cameroon | Canada | Cabo Verde | Cayman Islands |
Central African Republic | Chad | Chile | China | Colombia |
Comoros | Cook Islands | Costa Rica | Croatia | Cuba |
Curaçao | Cyprus | Denmark | Djibouti | Dominica |
Dominican Republic | Ecuador | Egypt | El Salvador | Equatorial Guinea |
Estonia | Ethiopia | Falkland Islands (Malvinas) | Fiji | Finland |
France | French Polynesia | Gabon | Gambia | Georgia |
Germany | Ghana | Gibraltar | Greece | Greenland |
Grenada | Guatemala | Guinea | Guinea-Bissau | Guyana |
Haiti | Honduras | Hungary | Iceland | India |
Indonesia | Iran (Islamic Republic of) | Iraq | Ireland | Isle of Man |
Israel | Italy | Jamaica | Japan | Jordan |
Kazakhstan | Kenya | Kiribati | Kuwait | Kyrgyzstan |
Lao People's Democratic Republic | Latvia | Lebanon | Lesotho | Liberia |
Libya | Liechtenstein | Lithuania | Luxembourg | Madagascar |
Malawi | Malaysia | Maldives | Mali | Malta |
Mauritania | Mauritius | Mexico | Republic of Moldova | Monaco |
Mongolia | Montenegro | Montserrat | Morocco | Mozambique |
Myanmar | Namibia | Nauru | Nepal | Netherlands |
New Caledonia | New Zealand | Nicaragua | Niger | Nigeria |
Niue | North Macedonia | Norway | Oman | Pakistan |
occupied Palestinian territory, including east Jerusalem | ||||
Panama | Papua New Guinea | Paraguay | Peru | Philippines |
Poland | Portugal | Qatar | Romania | Russian Federation |
Rwanda | Saint Kitts and Nevis | Saint Lucia | ||
Saint Vincent and the Grenadines | Samoa | San Marino | Sao Tome and Principe | Saudi Arabia |
Senegal | Serbia | Seychelles | Sierra Leone | Singapore |
Slovakia | Slovenia | Solomon Islands | Somalia | South Africa |
Republic of Korea | South Sudan | Spain | Sri Lanka | Sudan |
Suriname | Sweden | Switzerland | Syrian Arab Republic | Tajikistan |
United Republic of Tanzania | Thailand | Togo | Tonga | Trinidad and Tobago |
Tunisia | Turkey | Turkmenistan | Turks and Caicos Islands | Tuvalu |
Uganda | Ukraine | United Arab Emirates | The United Kingdom | United States of America |
Uruguay | Uzbekistan | Vanuatu | Venezuela (Bolivarian Republic of) | Viet Nam |
Wallis and Futuna | Yemen | Zambia | Zimbabwe |
--- Original source retains full ownership of the source dataset ---
Heliocentric trajectories for STEREO-A in Heliographic, HG, Heliographic Inertial, HGI, and Solar Ecliptic, SE, Coordinates The original trajectory data are taken from http://ssd.jpl.nasa.gov/horizons.cgi where users can find many more objects. In the case of orbit data for planets, the orbit data can be used as a proxy for spacecraft ephemeris that are in orbit about the planets. On a heliospheric scale, differences between the planet orbital tarjectory and that of the spacecraft are very small. For instance, the heliocentric longitudes differ by only 0.25° for a spacecraft stationed near the L1 Lagrange point at approximately 100 Earth radii upstream of the Earth. The production of the HG, HGI, and SE trajectory data requires a values for the "Equinox Epoch", which is defined as the epoch time when the direction from the Earth to the sun at the time of the vernal equinox when the sun seems to cross equatorial plane of the Earth from below. This direction is called the First Point of Aries, FPA and it is not a fixed direction but drifts by about 1.4° per century or 50.26" per year. In addition, there are tiny irregularities in FPA drift that are on the order of 1" per year or less. The Equinox Epoch can be determined by using a variety of methods for calculating the instantaneous FPA longitudinal direction and whether the tiny irregularities have been smoothed or averaged out. Four methods for determining the Equinox Epoch are in common usage: +---------------------------------------------------------------------+ Method Name FPA Longitude Definition --------------------------------------------------------------------- B1950.0 the actual FPA at 22:09 UT on December 31, 1949 J2000.0 the smoothed FPA at 12:00 UT on January 1, 2000 True of Date the actual FPA at 00:00 UT on the date of interest Mean of Date the smoothed FPA at 00:00 UT on the date of interest +---------------------------------------------------------------------+ The heliocentric trajectory data included in this data product have been calculated by using the Equinox Epoch: defined via the "Mean of Date" method. More precise coordinates, and some planet-centered coordinates, are found in the "traj" subdirectories of spacecraft specific directories at https://spdf.gsfc.nasa.gov/pub/data/ and http://ssd.jpl.nasa.gov/horizons.cgi.
The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.
The full-population dataset (with about 10 million individuals) is also distributed as open data.
The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.
Household, Individual
The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.
ssd
The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.
other
The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.
The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.
This is a synthetic dataset; the "response rate" is 100%.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Population, Total for United States (POPTOTUSA647NWDB) from 1960 to 2024 about population and USA.
THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JUNE 30
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.