64 datasets found

o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+1more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Worldwide COVID-19 Data from WHO (2025 Edition)
kaggle.com
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adil Shamim (2025). Worldwide COVID-19 Data from WHO (2025 Edition) [Dataset]. https://www.kaggle.com/datasets/adilshamim8/worldwide-covid-19-data-from-who
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2025
Dataset provided by
Kaggle
Authors
Adil Shamim
Description
Dataset Overview

This dataset contains global COVID-19 case and death data by country, collected directly from the official World Health Organization (WHO) COVID-19 Dashboard. It provides a comprehensive view of the pandemic’s impact worldwide, covering the period up to 2025. The dataset is intended for researchers, analysts, and anyone interested in understanding the progression and global effects of COVID-19 through reliable, up-to-date information.

Source Information

Website: WHO COVID-19 Dashboard

Organization: World Health Organization (WHO)

Data Coverage: Global (by country/territory)

Time Period: Up to 2025

The World Health Organization is the United Nations agency responsible for international public health. The WHO COVID-19 Dashboard is a trusted source that aggregates official reports from countries and territories around the world, providing daily updates on cases, deaths, and other key metrics related to COVID-19.

Dataset Contents

Country/Region: The name of the country or territory.

Date: Reporting date.

New Cases: Number of new confirmed COVID-19 cases.

Cumulative Cases: Total confirmed COVID-19 cases to date.

New Deaths: Number of new confirmed deaths due to COVID-19.

Cumulative Deaths: Total deaths reported to date.

Additional fields may include population, rates per 100,000, and more (see data files for details).

How to Use

This dataset can be used for: - Tracking the spread and trends of COVID-19 globally and by country - Modeling and forecasting pandemic progression - Comparative analysis of the pandemic’s impact across countries and regions - Visualization and reporting

Data Reliability

The data is sourced from the WHO, widely regarded as the most authoritative source for global health statistics. However, reporting practices and data completeness may vary by country and may be subject to revision as new information becomes available.

Acknowledgements

Special thanks to the WHO for making this data publicly available and to all those working to collect, verify, and report COVID-19 statistics.
f
Distribution of first name and last name frequencies by country
figshare.com
xlsx
Updated Feb 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Thelwall (2023). Distribution of first name and last name frequencies by country [Dataset]. http://doi.org/10.6084/m9.figshare.21956795.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21956795.v2
Dataset updated
Feb 2, 2023
Dataset provided by
figshare
Authors
Mike Thelwall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Distribution of first and last name frequencies of academic authors by country.

Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.

Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.

From the paper: Can national researcher mobility be tracked by first or last name uniqueness?

For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:

No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China
Global Country Information 2023
zenodo.org
data.niaid.nih.gov
csv
Updated Jun 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidula Elgiriyewithana; Nidula Elgiriyewithana (2024). Global Country Information 2023 [Dataset]. http://doi.org/10.5281/zenodo.8165229
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8165229
Dataset updated
Jun 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nidula Elgiriyewithana; Nidula Elgiriyewithana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

Key Features

Country: Name of the country.

Density (P/Km2): Population density measured in persons per square kilometer.

Abbreviation: Abbreviation or code representing the country.

Agricultural Land (%): Percentage of land area used for agricultural purposes.

Land Area (Km2): Total land area of the country in square kilometers.

Armed Forces Size: Size of the armed forces in the country.

Birth Rate: Number of births per 1,000 population per year.

Calling Code: International calling code for the country.

Capital/Major City: Name of the capital or major city.

CO2 Emissions: Carbon dioxide emissions in tons.

CPI: Consumer Price Index, a measure of inflation and purchasing power.

CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.

Currency_Code: Currency code used in the country.

Fertility Rate: Average number of children born to a woman during her lifetime.

Forested Area (%): Percentage of land area covered by forests.

Gasoline_Price: Price of gasoline per liter in local currency.

GDP: Gross Domestic Product, the total value of goods and services produced in the country.

Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.

Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.

Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.

Largest City: Name of the country's largest city.

Life Expectancy: Average number of years a newborn is expected to live.

Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.

Minimum Wage: Minimum wage level in local currency.

Official Language: Official language(s) spoken in the country.

Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.

Physicians per Thousand: Number of physicians per thousand people.

Population: Total population of the country.

Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.

Tax Revenue (%): Tax revenue as a percentage of GDP.

Total Tax Rate: Overall tax burden as a percentage of commercial profits.

Unemployment Rate: Percentage of the labor force that is unemployed.

Urban Population: Percentage of the population living in urban areas.

Latitude: Latitude coordinate of the country's location.

Longitude: Longitude coordinate of the country's location.

Potential Use Cases

Analyze population density and land area to study spatial distribution patterns.

Investigate the relationship between agricultural land and food security.

Examine carbon dioxide emissions and their impact on climate change.

Explore correlations between economic indicators such as GDP and various socio-economic factors.

Investigate educational enrollment rates and their implications for human capital development.

Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.

Study labor market dynamics through indicators such as labor force participation and unemployment rates.

Investigate the role of taxation and its impact on economic development.

Explore urbanization trends and their social and environmental consequences.
World Population Statistics - 2023
kaggle.com
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavik Jikadara (2024). World Population Statistics - 2023 [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/world-population-statistics-2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bhavik Jikadara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
The current US Census Bureau world population estimate in June 2019 shows that the current global population is 7,577,130,400 people on Earth, which far exceeds the world population of 7.2 billion in 2015. Our estimate based on UN data shows the world's population surpassing 7.7 billion.

China is the most populous country in the world with a population exceeding 1.4 billion. It is one of just two countries with a population of more than 1 billion, with India being the second. As of 2018, India has a population of over 1.355 billion people, and its population growth is expected to continue through at least 2050. By the year 2030, India is expected to become the most populous country in the world. This is because India’s population will grow, while China is projected to see a loss in population.

The following 11 countries that are the most populous in the world each have populations exceeding 100 million. These include the United States, Indonesia, Brazil, Pakistan, Nigeria, Bangladesh, Russia, Mexico, Japan, Ethiopia, and the Philippines. Of these nations, all are expected to continue to grow except Russia and Japan, which will see their populations drop by 2030 before falling again significantly by 2050.

Many other nations have populations of at least one million, while there are also countries that have just thousands. The smallest population in the world can be found in Vatican City, where only 801 people reside.

In 2018, the world’s population growth rate was 1.12%. Every five years since the 1970s, the population growth rate has continued to fall. The world’s population is expected to continue to grow larger but at a much slower pace. By 2030, the population will exceed 8 billion. In 2040, this number will grow to more than 9 billion. In 2055, the number will rise to over 10 billion, and another billion people won’t be added until near the end of the century. The current annual population growth estimates from the United Nations are in the millions - estimating that over 80 million new lives are added yearly.

This population growth will be significantly impacted by nine specific countries which are situated to contribute to the population growth more quickly than other nations. These nations include the Democratic Republic of the Congo, Ethiopia, India, Indonesia, Nigeria, Pakistan, Uganda, the United Republic of Tanzania, and the United States of America. Particularly of interest, India is on track to overtake China's position as the most populous country by 2030. Additionally, multiple nations within Africa are expected to double their populations before fertility rates begin to slow entirely.

Content

In this Dataset, we have Historical Population data for every Country/Territory in the world by different parameters like Area Size of the Country/Territory, Name of the Continent, Name of the Capital, Density, Population Growth Rate, Ranking based on Population, World Population Percentage, etc. >Dataset Glossary (Column-Wise):

Rank: Rank by Population.

CCA3: 3 Digit Country/Territories Code.

Country/Territories: Name of the Country/Territories.

Capital: Name of the Capital.

Continent: Name of the Continent.

2022 Population: Population of the Country/Territories in the year 2022.

2020 Population: Population of the Country/Territories in the year 2020.

2015 Population: Population of the Country/Territories in the year 2015.

2010 Population: Population of the Country/Territories in the year 2010.

2000 Population: Population of the Country/Territories in the year 2000.

1990 Population: Population of the Country/Territories in the year 1990.

1980 Population: Population of the Country/Territories in the year 1980.

1970 Population: Population of the Country/Territories in the year 1970.

Area (km²): Area size of the Country/Territories in square kilometers.

Density (per km²): Population Density per square kilometer.

Growth Rate: Population Growth Rate by Country/Territories.

World Population Percentage: The population percentage by each Country/Territories.
countries of the world
kaggle.com
Updated Jan 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rob Cobb (2023). countries of the world [Dataset]. https://www.kaggle.com/datasets/robbcobb/countries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rob Cobb
Area covered
World
Description
Copy of https://www.kaggle.com/datasets/kisoibo/countries-databasesqlite

Updated the name of the table from 'countries of the world' to 'countries', for ease of writing queries.

Info about the dataset:

Content

Table Total Rows Total Columns countries of the world **0 ** ** 20** Country, Region, Population, Area (sq. mi.), Pop. Density (per sq. mi.), Coastline (coast/area ratio), Net migration, Infant mortality (per 1000 births), GDP ($ per capita), Literacy (%), Phones (per 1000), Arable (%), Crops (%), Other (%), Climate, Birthrate, Deathrate, Agriculture, Industry, Service

Acknowledgements

Acknowledgements Source: All these data sets are made up of data from the US government. Generally they are free to use if you use the data in the US. If you are outside of the US, you may need to contact the US Govt to ask. Data from the World Factbook is public domain. The website says "The World Factbook is in the public domain and may be used freely by anyone at anytime without seeking permission." https://www.cia.gov/library/publications/the-world-factbook/docs/faqs.html

Inspiration

When making visualisations related to countries, sometimes it is interesting to group them by attributes such as region, or weigh their importance by population, GDP or other variables.
Worldwide Soundscapes project meta-data
zenodo.org
Updated Dec 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海李; 松海李; 黎君董; 黎君董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song (2022). Worldwide Soundscapes project meta-data [Dataset]. http://doi.org/10.5281/zenodo.7415473
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7415473
Dataset updated
Dec 9, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海李; 松海李; 黎君董; 黎君董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated soundscape datasets. This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description.

The overview of all sampling sites can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. More information on the project can be found here and on ResearchGate.

The audio recording criteria justifying inclusion into the meta-database are:

Stationary (no transects, towed sensors or microphones mounted on cars)

Passive (unattended, no human disturbance by the recordist)

Ambient (no spatial or temporal focus on a particular species or direction)

Spatially and/or temporally replicated (multiple sites sampled at least at one common daytime or multiple days sampled at least in one common site)

The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database.

datasets

dataset_id: incremental integer, primary key

name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.

subset: incremental integer that can be used to distinguish datasets with identical names

collaborators: full names of people deemed responsible for the dataset, separated by commas

contributors: full names of people who are not the main collaborators but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses, separated by commas.

date_added: when the datased was added (DD/MM/YYYY)

URL_open_recordings: if recordings (even only some) from this dataset are openly available, indicate the internet link where they can be found.

URL_project: internet link for further information about the corresponding project

DOI_publication: DOI of corresponding publications, separated by comma

core_realm_IUCN: The core realm of the dataset. Datasets may have multiple realms, but the main one should be listed. Datasets may contain sampling sites from different realms in the "sites" sheet. IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/

medium: the physical medium the microphone is situated in

protected_area: Whether the sampling sites were situated in protected areas or not, or only some.

GADM0: For datasets on land or in territorial waters, Global Administrative Database level0
https://gadm.org/

GADM1: For datasets on land or in territorial waters, Global Administrative Database level1
https://gadm.org/

GADM2: For datasets on land or in territorial waters, Global Administrative Database level2
https://gadm.org/

IHO: For marine locations, the sea area that encompassess all the sampling locations according to the International Hydrographic Organisation. Map here: https://www.arcgis.com/home/item.html?id=44e04407fbaf4d93afcb63018fbca9e2

locality: optional free text about the locality

latitude_numeric_region: study region approximate centroid latitude in WGS84 decimal degrees

longitude_numeric_region: study region approximate centroid longitude in WGS84 decimal degrees

sites_number: number of sites sampled

year_start: starting year of the sampling

year_end: ending year of the sampling

deployment_schedule: description of the sampling schedule, provisional

temporal_recording_selection: list environmental exclusion criteria that were used to determine which recording days or times to discard

high_pass_filter_Hz: frequency of the high-pass filter of the recorder, in Hz

variable_sampling_frequency: Does the sampling frequency vary? If it does, write "NA" in the sampling_frequency_kHz column and indicate it in the sampling_frequency_kHz column inside the deployments sheet

sampling_frequency_kHz: frequency the microphone was sampled at (sounds of half that frequency will be recorded)

variable_recorder:

recorder: recorder model used

microphone: microphone used

freshwater_recordist_position: position of the recordist relative to the microphone during sampling (only for freshwater)

collaborator_comments: free-text field for comments by the collaborators

validated: This cell is checked if the contents of all sheets are complete and have been found to be coherent and consistent with our requirements.

validator_name: name of person doing the validation

validation_comments: validators: please insert the date when someone was contacted

cross-check: this cell is checked if the collaborators confirm the spatial and temporal data after checking the corresponding site maps, deployment and operation time graphs found at https://drive.google.com/drive/folders/1qfwXH_7dpFCqyls-c6b8RZ_fbcn9kXbp?usp=share_link

datasets-sites

dataset_ID: primary key of datasets table

dataset_name: lookup field

site_ID: primary key of sites table

site_name: lookup field

sites

site_ID: unique site IDs, larger than 1000 for compatibility with ecoSound-web

site_name: name or code of sampling site as used in respective projects

latitude_numeric: exact numeric degrees coordinates of latitude

longitude_numeric: exact numeric degrees coordinates of longitude

topography_m: for sites on land: elevation. For marine sites: depth (negative). in meters

freshwater_depth_m

realm: Ecosystem type according to IUCN GET https://global-ecosystems.org/

biome: Ecosystem type according to IUCN GET https://global-ecosystems.org/

functional_group: Ecosystem type according to IUCN GET https://global-ecosystems.org/

comments

deployments

dataset_ID: primary key of datasets table

dataset_name: lookup field

deployment: use identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.

start_date_min: earliest date of deployment start, double-click cell to get date-picker

start_date_max: latest date of deployment start, if applicable (only used when recorders were deployed over several days), double-click cell to get date-picker

start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

permanent: is the deployment permanent (in which case it would be ongoing and the end date or duration would be unknown)?

variable_duration_days: is the duration of the deployment variable? in days

duration_days: deployment duration per recorder (use the minimum if variable)

end_date_min: earliest date of deployment end, only needed if duration is variable, double-click cell to get date-picker

end_date_max: latest date of deployment end, only needed if duration is variable, double-click cell to get date-picker

end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.

recording_time: does the recording last from the deployment start time to the end time (continuous) or at scheduled daily intervals (scheduled)? Note: we consider recordings with duty cycles to be continuous.

operation_start_time_mixed: scheduled recording start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

operation_duration_minutes: duration of operation in minutes, if constant

operation_end_time_mixed: scheduled recording end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes.

sampling_frequency_kHz: only indicate the sampling frequency if it is variable within a particular dataset so that we need to code different frequencies for different deployments

recorder

subset_sites: If the deployment was not done in all the sites of the
d
COVID Impact Survey - Public Data
data.world
csv, zip
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2024). COVID Impact Survey - Public Data [Dataset]. https://data.world/associatedpress/covid-impact-survey-public-data
Explore at:
csv, zipAvailable download formats
Dataset updated
Oct 16, 2024
Authors
The Associated Press
Description
Overview

The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.

Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).

The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.

The survey is focused on three core areas of research:

Physical Health: Symptoms related to COVID-19, relevant existing conditions and health insurance coverage.

Economic and Financial Health: Employment, food security, and government cash assistance.

Social and Mental Health: Communication with friends and family, anxiety and volunteerism. (Questions based on those used on the U.S. Census Bureau’s Current Population Survey.) ## Using this Data - IMPORTANT This is survey data and must be properly weighted during analysis: DO NOT REPORT THIS DATA AS RAW OR AGGREGATE NUMBERS!!

Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.

Queries

If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".

Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.

Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.

The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."

Margin of Error

The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:

At least twice the margin of error, you can report there is a clear difference.

At least as large as the margin of error, you can report there is a slight or apparent difference.

Less than or equal to the margin of error, you can report that the respondents are divided or there is no difference. ## A Note on Timing Survey results will generally be posted under embargo on Tuesday evenings. The data is available for release at 1 p.m. ET Thursdays.

About the Data

The survey data will be provided under embargo in both comma-delimited and statistical formats.

Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)

Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.

Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.

Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.

Attribution

Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.

AP Data Distributions

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
World Population & Health Data 2014 - 2024
kaggle.com
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Faizal Rosyid (2025). World Population & Health Data 2014 - 2024 [Dataset]. https://www.kaggle.com/datasets/faizalrosyid/world-population-and-health-data-2014-2024
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 21, 2025
Dataset provided by
Kaggle
Authors
Faizal Rosyid
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
World
Description
This dataset provides an extensive view of global population statistics and health metrics across various countries from 2014 to 2024. It combines population data with vital health-related indicators, making it a valuable resource for understanding trends in population growth and health outcomes worldwide. Researchers, data scientists, and policymakers can utilize this dataset to analyze correlations between population dynamics and health performance at a global scale.

Key Features: - Country: Name of the country. - Year: Year of the data (2014–2024). - Population: Total population for the respective year and country. - Country Code: ISO 3-letter country codes for easy identification. - Health Expenditure (health_exp): Percentage of GDP spent on healthcare. - Life Expectancy (life_expect): Average life expectancy at birth in years. - Maternal Mortality (maternal_mortality): Maternal deaths per 100,000 live births. - Infant Mortality (infant_mortality): Deaths of infants under 1 year per 1,000 live births. - Neonatal Mortality (neonatal_mortality): Deaths of newborns (0–28 days) per 1,000 live births. - Under-5 Mortality (under_5_mortality): Deaths of children under 5 years per 1,000 live births. - HIV Prevalence (prev_hiv): Percentage of the population living with HIV. - Tuberculosis Incidence (inci_tuberc): Estimated new and relapse TB cases per 100,000 people. - Undernourishment Prevalence (prev_undernourishment): Percentage of the population that is undernourished.

Use Cases: - Health Policy Analysis: Understand trends in healthcare expenditure and its relationship to health outcomes. - Global Health Research: Investigate global or regional disparities in health and nutrition. - Population Studies: Analyze population growth trends alongside health indicators. - Data Visualization: Build visual dashboards for storytelling and impactful data representation.
GBIF Backbone Taxonomy
gbif.org
smng.net
+2more
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GBIF Secretariat (2023). GBIF Backbone Taxonomy [Dataset]. http://doi.org/10.15468/39omei
Explore at:
Unique identifier
https://doi.org/10.15468/39omei
Dataset updated
Nov 17, 2023
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.

It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.

International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.

UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.

The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.

The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:
Catalogue of Life Checklist - 4766428 names
International Barcode of Life project (iBOL) Barcode Index Numbers (BINs) - 635951 names
UNITE - Unified system for the DNA based fungal species linked to the classification - 611208 names
The Paleobiology Database - 212054 names
World Register of Marine Species - 188857 names
The Interim Register of Marine and Nonmarine Genera - 183894 names
The World Checklist of Vascular Plants (WCVP) - 131891 names
GBIF Backbone Taxonomy - 114350 names
TAXREF - 109374 names
The Leipzig catalogue of vascular plants - 75380 names
ZooBank - 73549 names
Integrated Taxonomic Information System (ITIS) - 68377 names
Plazi.org taxonomic treatments database - 61346 names
Genome Taxonomy Database r207 - 60545 names
International Plant Names Index - 52329 names
Fauna Europaea - 45077 names
The National Checklist of Taiwan (Catalogue of Life in Taiwan, TaiCoL) - 36193 names
Dyntaxa. Svensk taxonomisk databas - 35892 names
The Plant List with literature - 32692 names
United Kingdom Species Inventory (UKSI) - 29643 names
Artsnavnebasen - 29208 names
The IUCN Red List of Threatened Species - 21221 names
Afromoths, online database of Afrotropical moth species (Lepidoptera) - 13961 names
Brazilian Flora 2020 project - Projeto Flora do Brasil 2020 - 13829 names
Prokaryotic Nomenclature Up-to-Date (PNU) - 10079 names
Checklist Dutch Species Register - Nederlands Soortenregister - 8814 names
ICTV Master Species List (MSL) - 7852 names
Cockroach Species File - 6020 names
GRIN Taxonomy - 5882 names
Taxon list of fungi and fungal-like organisms from Germany compiled by the DGfM - 4570 names
Catalogue of Afrotropical Bees - 3623 names
Catalogue of Tenebrionidae (Coleoptera) of North America - 3327 names
Checklist of Beetles (Coleoptera) of Canada and Alaska. Second Edition. - 3312 names
Systema Dipterorum - 2850 names
Catalogue of the Pterophoroidea of the World - 2807 names
The Clements Checklist - 2675 names
Taxon list of Hymenoptera from Germany compiled in the context of the GBOL project - 2496 names
IOC World Bird List, v13.2 - 2366 names
Official Lists and Indexes of Names in Zoology - 2310 names
National checklist of all species occurring in Denmark - 1922 names
Myriatrix - 1876 names
Database of Vascular Plants of Canada (VASCAN) - 1822 names
Taxon list of vascular plants from Bavaria, Germany compiled in the context of the BFL project - 1771 names
Orthoptera Species File - 1742 names
A list of the terrestrial fungi, flora and fauna of Madeira and Selvagens archipelagos - 1602 names
Aphid Species File - 1565 names
World Spider Catalog - 1561 names
Taxon list of Jurassic Pisces of the Tethys Palaeo-Environment compiled at the SNSB-JME - 1270 names
Backbone Family Classification Patch - 1143 names
GBIF Algae Classification - 1100 names
International Cichorieae Network (ICN): Cichorieae Portal - 975 names
Psocodea Species File - 803 names
New Zealand Marine Macroalgae Species Checklist - 787 names
Annotated checklist of endemic species from the Western Balkans - 754 names
Taxon list of animals with German names (worldwide) compiled at the SMNS - 503 names
Catalogue of the Alucitoidea of the World - 472 names
Lygaeoidea Species File - 462 names
Catálogo de Plantas y Líquenes de Colombia - 422 names
GBIF Backbone Patch - 317 names
Phasmida Species File - 259 names
Cortinariaceae fetched from the Index Fungorum API - 234 names
Coreoidea Species File - 233 names
GTDB supplement - 139 names
Mantodea Species File - 119 names
Endemic species in Taiwan - 93 names
Taxon list of Araneae from Germany compiled in the context of the GBOL project - 88 names
Species of Hominidae - 78 names
Taxon list of Sternorrhyncha from Germany compiled in the context of the GBOL project - 77 names
Taxon list of mosses from Germany compiled in the context of the GBOL project - 75 names
Mammal Species of the World - 73 names
Plecoptera Species File - 71 names
Species Fungorum Plus - 64 names
Catalogue of the type specimens of Cosmopterigidae (Lepidoptera: Gelechioidea) from research collections of the Zoological Institute, Russian Academy of Sciences - 47 names
Species named after famous people - 41 names
Dermaptera Species File - 36 names
Taxon list of Trichoptera from Germany compiled in the context of the GBOL project - 34 names
True Fruit Flies (Diptera, Tephritidae) of the Afrotropical Region - 33 names
Range and Regularities in the Distribution of Earthworms of the Earthworms of the USSR Fauna. Perel, 1979 - 32 names
Taxon list of Diplura from Germany compiled in the context of the GBOL project - 30 names
Lista de referencia de especies de aves de Colombia - 2022 - 24 names
Taxon list of Auchenorrhyncha from Germany compiled in the context of the GBOL project - 20 names
Catalogue of the type specimens of Polycestinae (Coleoptera: Buprestidae) from research collections of the Zoological Institute, Russian Academy of Sciences - 19 names
Taxon list of Thysanoptera from Germany compiled in the context of the GBOL project - 19 names
Lista de especies de vertebrados registrados en jurisdicción del Departamento del Huila - 18 names
Taxon list of Microcoryphia (Archaeognatha) from Germany compiled in the context of the GBOL project - 15 names
Catalogue of the type specimens of Bufonidae and Megophryidae (Amphibia: Anura) from research collections of the Zoological Institute, Russian Academy of Sciences - 12 names
Grylloblattodea Species File - 11 names
Coleorrhyncha Species File - 9 names
Taxon list of liverworts from Germany compiled in the context of the GBOL project - 9 names
Embioptera Species File - 7 names
Taxon list of Pisces and Cyclostoma from Germany compiled in the context of the GBOL project - 6 names
Taxon list of Pteridophyta from Germany compiled in the context of the GBOL project - 6 names
Taxon list of Siphonaptera from Germany compiled in the context of the GBOL project - 5 names
The Earthworms of the Fauna of Russia. Perel, 1997 - 5 names
Taxon list of Zygentoma from Germany compiled in the context of the GBOL project - 4 names
Asiloid Flies: new taxa of Diptera: Apioceridae, Asilidae, and Mydidae - 3 names
Taxon list of Protura from Germany compiled in the context of the GBOL project - 3 names
Taxon list of hornworts from Germany compiled in the context of the GBOL project - 2 names
Chrysididae Species File - 1 names
Taxon list of Dermaptera from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Diplopoda from Germany in the context of the GBOL project - 1 names
Taxon list of Orthoptera (Grashoppers) from Germany compiled at the SNSB - 1 names
Taxon list of Pscoptera from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Pseudoscorpiones from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Raphidioptera from Germany compiled in the context of the GBOL project - 1 names
Z
Worldwide Soundscapes project metadata and analysis scripts
data.niaid.nih.gov
zenodo.org
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amandine Gasc (2025). Worldwide Soundscapes project metadata and analysis scripts [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6486835
Explore at:
Dataset updated
Mar 25, 2025
Dataset provided by
Thomas Cherico Wanger
Youfang Chen
Amandine Gasc
Dong, Lijun
Li, Songhai
Steven Van Wilgenburg
Rodney Rountree
Kevin F.A. Darras
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated passive acoustic monitoring meta-datasets (i.e. meta-data collections). This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description. Additionally, R scripts are provided to replicate the analysis published in [placeholder].

The overview of all sampling sites and timelines can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. The recordings of this collection were annotated and analysed to explore macro-ecological trends.

The audio recording criteria justifying inclusion into the meta-database are:

Stationary (no transects, towed sensors or microphones mounted on cars)

Passive (unattended, no human disturbance by the recordist)

Ambient (no directional microphone or triggered recordings, non-experimental conditions)

Spatially and/or temporally replicated (i.e. multiple sites sampled at the same time and/or multiple days - covering the same daytime - sampled at the same site)

The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database. The data shared here only includes validated collections.

Changes from version 3.0.1

Added files needed to reproduce the metadata and the acoustic analyses found in the publication.

Dropped underused fields: spatial_selection, temporal_exclusion, freshwater_recordist_position from collections table; secondary realm, biome, and functional group from sites table.

Meta-database CSV files

collections

collection_id: unique integer, primary key

name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.

ecoSound-web_link: link of validated meta-collection on ecoSound-web

primary_contributors: full names of people deemed corresponding contributors who are responsible for the dataset

secondary_contributors: full names of people who are not primary contributors but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses

date_added: when the datased was added (YYYY-MM-DD)

URL_open_recordings: internet link for openly-available recordings from this collection

URL_project: internet link for further information about the corresponding project

DOI_publication: Digital Object Identifiers of corresponding publications

core_realm_IUCN: The main, core realm of the dataset according to IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/

medium: the physical medium the microphone is situated in

locality: optional free text about the locality

contributor_comments: free-text field for comments by the primary contributors

collections-sites

dataset_ID: primary key of collections table

site_ID: primary key of sites table

sites

site_ID: unique integer, primary key

site_name: internal name or code of sampling site as used in respective projects

latitude_numeric: site's numeric degrees of latitude

longitude_numeric: site's numeric degrees of longitude

blurred_coordinates: whether latitude and longitude coordinates are inaccurate, boolean. Coordinates may be blurred with random offsets, rounding, snapping, etc. Indicate the blurring method inside the comments field

topography_m: vertical position of the microphone relative to the sea level. for sites on land: elevation. For marine sites: depth (negative). in meters. Only indicate if the values were measured by the collaborator.

freshwater_depth_m: microphone depth, only used for sites inside freshwater bodies that also have an elevation value above the sea level

realm: Ecosystem type: main realm according to IUCN GET https://global-ecosystems.org/

biome: Ecosystem type: main biome according to IUCN GET https://global-ecosystems.org/

functional_group: Ecosystem type: main functional group according to IUCN GET https://global-ecosystems.org/

contributor_comments: free text field for contributor comments

GADM_0: Global ADMinistrative Database level 0 classification of terrestrial site or marine site that is within territorial waters. Source: https://gadm.org/download_world.html

IHO: International Hydrographic Organization classification of marine site. Source: https://marineregions.org/downloads.php

WDPA: World Database on Protected Areas classification of the site. Source: https://www.protectedplanet.net/en/thematic-areas/wdpa?tab=WDPA

deployments

dataset_ID: primary key of datasets table

deployment: identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.

subset_site_ID: If the deployment was not done in all the sites of the corresponding collection, site IDs where the deployment was conducted

start_date: date of deployment start

start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

permanent: whether the deployment is permanent, boolean

end_date: date of deployment end (date when last scheduled operation starts)

end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.

operation_mode: continuous: recording takes place from the deployment start date-time to deployment end date-time.periodical: recording takes place periodically (i.e., with duty cycle) from the deployment start date-time to deployment end date-time.scheduled: recording takes place during scheduled daily time intervals (optionally with duty cycle)

duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". empty if no duty cycle is used. For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes

operation_start_time_mixed: only for scheduled recordings: start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

operation_duration_minutes: only for scheduled recordings: duration of operation in minutes, if constant

operation_end_time_mixed: only for scheduled recordings: end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Only required if durations are variable. Do not use when end times are ambiguous (for instance, if a recording could be 1 hour or 25 hours long because the end is on the next day). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

high_pass_filter_Hz: frequency of the high-pass filter of the recorder if applied, in Hz. Otherwise, write "none". This may be called a "low-cut" filter too.

bit_depth: sampling bit depth of the recordings. Often constant for a particular recorder

channels: number of recorded audio channels

sampling_frequency_kHz: frequency at which the microphone signal was sampled by the recorder (sounds of half that frequency will be recorded)

recorder: recorder used for deployment

microphone: microphone used for deployment

target_taxa: main IUCN animal taxa that were studied with this deployment, using the exact IUCN Red list names (http://www.iucnredlist.org/), separated by commas. Only genera, families, orders, and classes are accepted. Empty if there was no taxonomic focus (i.e., general soundscapes were the study focus).

contributor_comments: free text field for contributor comments

exact_recordings: whether the deployment data here have been superseded by inserting more exact recording date-time ranges into the meta-collection on ecoSound-web

recordings (partial download from ecoSound-web)

recording_id: primary key of the recordings table

collection_id: ID of the collection the recording belongs to

name: name of the recording

site_id: site ID the recording belongs to:

recorder_id: ID of the recorder used for the recording (internal ecoSound-web code)

microphone_id: ID of the microphone used for the recording (internal ecoSound-web code)

recording_gain:recording gain applied for amplifying the audio signal, in decibels

duty_cycle_recording: fraction of the recording periode when the recorder is actively recording audio

duty_cycle_period: period of the duty cycle, i.e., time between the starts of two subsequent recordings

note: comments (contains the target taxon)

file_date: date of the recording start

file_time: local time of the recording start

sampling_rate: audio sampling rate in Hz

bitdepth: depth in bits for each audio sample

channel_num: number of channels

duration: duration of the recording in seconds. Note: duty-cycled recordings cover only a proportion of this duration

affiliations

affiliation_id: primary key of affiliations table

lab_research_group: Laboratory or research group name

department_school_institute: department, school, or institute name

university_institution: University or institution name

street_address: street address

region_state_province_city: region, state, province, or city name

postal_code: postal code

country: country
A
‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-vehicle-miles-traveled-during-covid-19-lock-downs-636d/latest
Explore at:
Dataset updated
Jan 4, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/vehicle-miles-travelede on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

**This data set was last updated 3:30 PM ET Monday, January 4, 2021. The last date of data in this dataset is December 31, 2020. **

Overview

Data shows that mobility declined nationally since states and localities began shelter-in-place strategies to stem the spread of COVID-19. The numbers began climbing as more people ventured out and traveled further from their homes, but in parallel with the rise of COVID-19 cases in July, travel declined again.

This distribution contains county level data for vehicle miles traveled (VMT) from StreetLight Data, Inc, updated three times a week. This data offers a detailed look at estimates of how much people are moving around in each county.

Data available has a two day lag - the most recent data is from two days prior to the update date. Going forward, this dataset will be updated by AP at 3:30pm ET on Monday, Wednesday and Friday each week.

This data has been made available to members of AP’s Data Distribution Program. To inquire about access for your organization - publishers, researchers, corporations, etc. - please click Request Access in the upper right corner of the page or email kromano@ap.org. Be sure to include your contact information and use case.

Findings

Nationally, data shows that vehicle travel in the US has doubled compared to the seven-day period ending April 13, which was the lowest VMT since the COVID-19 crisis began. In early December, travel reached a low not seen since May, with a small rise leading up to the Christmas holiday.

Average vehicle miles traveled continues to be below what would be expected without a pandemic - down 38% compared to January 2020. September 4 reported the largest single day estimate of vehicle miles traveled since March 14.

New Jersey, Michigan and New York are among the states with the largest relative uptick in travel at this point of the pandemic - they report almost two times the miles traveled compared to their lowest seven-day period. However, travel in New Jersey and New York is still much lower than expected without a pandemic. Other states such as New Mexico, Vermont and West Virginia have rebounded the least.

About This Data

The county level data is provided by StreetLight Data, Inc, a transportation analysis firm that measures travel patterns across the U.S.. The data is from their Vehicle Miles Traveled (VMT) Monitor which uses anonymized and aggregated data from smartphones and other GPS-enabled devices to provide county-by-county VMT metrics for more than 3,100 counties. The VMT Monitor provides an estimate of total vehicle miles travelled by residents of each county, each day since the COVID-19 crisis began (March 1, 2020), as well as a change from the baseline average daily VMT calculated for January 2020. Additional columns are calculations by AP.

Included Data

01_vmt_nation.csv - Data summarized to provide a nationwide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

02_vmt_state.csv - Data summarized to provide a statewide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

03_vmt_county.csv - Data providing a county level look at vehicle miles traveled. Includes VMT estimate, percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

Additional Data Queries

* Filter for specific state - filters 02_vmt_state.csv daily data for specific state.

* Filter counties by state - filters 03_vmt_county.csv daily data for counties in specific state.

* Filter for specific county - filters 03_vmt_county.csv daily data for specific county.

Interactive

The AP has designed an interactive map to show percent change in vehicle miles traveled by county since each counties lowest point during the pandemic:

This dataset was created by Angeliki Kastanis and contains around 0 samples along with Date At Low, Mean7 County Vmt At Low, technical information and other features such as: - County Name - County Fips - and more.

How to use this dataset

Analyze State Name in relation to Baseline Jan Vmt

Study the influence of Date At Low on Mean7 County Vmt At Low

More datasets

Acknowledgements

If you use this dataset in your research, please credit Angeliki Kastanis

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---

Synthetic population for USA_ALABAMA

zenodo.org
explore.openaire.eu

bin, pdf, zip

Updated Jul 16, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie; Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie (2024). Synthetic population for USA_ALABAMA [Dataset]. http://doi.org/10.5281/zenodo.6505866

Explore at:

pdf, zip, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6505866

Dataset updated

Jul 16, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

United States, Alabama

Description

Synthetic populations for regions of the World (SPW) | Alabama

Dataset information

A synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).

License

CC-BY-4.0

Acknowledgment

This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).

Contact information

Henning.Mortveit@virginia.edu

Identifiers


Region name	Alabama
Region ID	usa_140002904
Model	coarse
Version	0_9_0

Statistics

Name	Value
Population	4768478
Average age	37.8
Households	1933164
Average household size	2.5
Residence locations	1933164
Activity locations	398709
Average number of activities	5.7
Average travel distance	65.0

Sources

Description	Name	Version	Url
Activity template data	World Bank	2021	https://data.worldbank.org
Administrative boundaries	ADCW	7.6	https://www.adci.com/adc-worldmap
Curated POIs based on OSM	SLIPO/OSM POIs		http://slipo.eu/?p=1551 https://www.openstreetmap.org/
Household data	IPUMS		https://international.ipums.org/international
Population count with demographic attributes	GPW	v4.11	https://sedac.ciesin.columbia.edu/data/set/gpw-v4-admin-unit-center-points-population-estimates-rev11

Files description

Base data files (usa_140002904_data_v_0_9.zip)

Filename	Description
`usa_140002904_person_v_0_9.csv`	Data for each person including attributes such as age, gender, and household ID.
`usa_140002904_household_v_0_9.csv`	Data at household level.
`usa_140002904_residence_locations_v_0_9.csv`	Data about residence locations
`usa_140002904_activity_locations_v_0_9.csv`	Data about activity locations, including what activity types are supported at these locations
`usa_140002904_activity_location_assignment_v_0_9.csv`	For each person and for each of their activities, this file specifies the location where the activity takes place

Derived data files

Filename	Description
`usa_140002904_contact_matrix_v_0_9.csv`	A POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model.

Validation and measures files

Filename	Description
`usa_140002904_household_grouping_validation_v_0_9.pdf`	Validation plots for household construction
`usa_140002904_activity_durations_{adult,child}_v_0_9.pdf`	Comparison of time spent on generated activities with survey data
`usa_140002904_activity_patterns_{adult,child}_v_0_9.pdf`	Comparison of generated activity patterns by the time of day with survey data
`usa_140002904_location_construction_0_9.pdf`	Validation plots for location construction
`usa_140002904_location_assignement_0_9.pdf`	Validation plots for location assignment, including travel distribution plots
`usa_140002904_usa_140002904_ver_0_9_0_avg_travel_distance.pdf`	Choropleth map visualizing average travel distance
`usa_140002904_usa_140002904_ver_0_9_0_travel_distr_combined.pdf`	Travel distance distribution
`usa_140002904_usa_140002904_ver_0_9_0_num_activity_loc.pdf`	Choropleth map visualizing number of activity locations
`usa_140002904_usa_140002904_ver_0_9_0_avg_age.pdf`	Choropleth map visualizing average age
`usa_140002904_usa_140002904_ver_0_9_0_pop_density_per_sqkm.pdf`	Choropleth map visualizing population density
`usa_140002904_usa_140002904_ver_0_9_0_pop_size.pdf`	Choropleth map visualizing population size

o
Global News Popularity Insights Datset
opendatabay.com
.undefined
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Global News Popularity Insights Datset [Dataset]. https://www.opendatabay.com/data/ai-ml/b036c2ea-2b40-4afe-8dc2-1c56302ffdbc
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Social Media and Networking
Description
This dataset captures the popularity of news articles across various social media platforms, providing valuable insights into how news content performs online [1, 2]. It is a subset of a larger dataset, specifically designed for analysing engagement and reach of news items [1, 2]. The data includes key details about news articles and their final popularity scores on Facebook, Google+, and LinkedIn [1-3]. It serves as an excellent resource for understanding social media trends and the dissemination of news [2].

Columns

The dataset features the following columns: * IDLink: A unique identifier for each news item [1, 2]. * Title: The title of the news item as it appeared from the official media sources [1, 2]. * Headline: The headline of the news item, also from official media sources [1, 2]. * Source: The original news outlet that published the news item [1, 2]. * Topic: The query topic used to obtain the news items from official media sources [1, 2]. * PublishDate: The date and time when the news item was published [1, 2]. * Facebook: The final popularity score of the news item on Facebook [2, 3]. * GooglePlus: The final popularity score of the news item on Google+ [2, 3]. * LinkedIn: The final popularity score of the news item on LinkedIn [2, 3]. This subset of the data is specifically noted to be missing the 'SentimentTitle' and 'SentimentHeadline' columns that are present in the full dataset [1].

Distribution

This dataset comprises approximately 37,000 news articles [1]. While exact row counts for files are not specified beyond this total, the dataset format is typically CSV [4]. * Unique Values: * IDLink: 37,288 unique values [3]. * Title: 32,366 unique values [3]. * Headline: 34,634 unique values [3]. * Source Distribution: * Bloomberg: 2% [3]. * Reuters: 1% [3]. * Other: 97% (from 35,990 sources) [3]. * Topic Distribution: * Economy: 36% [3]. * Obama: 31% [3]. * Other: 33% (from 12,165 topics) [3]. * Time Range Sample (2016): * 03/29 - 04/03: 2,239 items [5]. * 04/03 - 04/08: 2,020 items [5]. * 06/17 - 06/22: 1,650 items [5]. * 06/27 - 07/02: 2,024 items [5]. The data spans from 2016-03-29 to 2016-07-07 [6].

Usage

This dataset is ideal for: * Analysing news popularity trends across different social media platforms [2]. * Studying the impact of news content on online engagement [2]. * Exploratory data analysis of news consumption patterns [7]. * Understanding the spread of information in digital environments. * Developing models to predict social media reach for news articles. * Insights into media outlets' influence and topic relevance [1, 3].

Coverage

The dataset covers an approximate 8-month period, between November 2015 and July 2016 [2]. The specific subset provided covers 29 March 2016 to 07 July 2016 [6]. It includes news items on four primary topics: economy, Microsoft, Obama, and Palestine [2], with distribution details for 'economy' and 'obama' [3]. The region of coverage is global [8].

License

CCO

Who Can Use It

Data Scientists and Analysts: For exploratory data analysis, feature engineering, and model building related to news popularity and social media engagement [7].

Researchers: Studying media studies, social network analysis, and public opinion.

Marketing Professionals: To understand content virality and optimise news dissemination strategies.

Journalists and Media Organisations: For insights into their content performance and audience engagement on social platforms.

Dataset Name Suggestions

Social Media News Popularity

Online News Engagement Metrics

Digital News Dissemination Data

News Virality on Social Platforms

Global News Popularity Insights

Attributes

Original Data Source: News Popularity in Multiple Social Media Platforms
Z
Dataset for the Article "A Predictive Method to Improve the Effectiveness of...
data.niaid.nih.gov
zenodo.org
Updated May 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riccardo Martoglia (2021). Dataset for the Article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4782983
Explore at:
Dataset updated
May 24, 2021
Dataset provided by
Federica Mandreoli
Marco Furini
Riccardo Martoglia
Manuela Montangero
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset for the article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario".

Abstract:

Museums are embracing social technologies in the attempt to broaden their audience and to engage people. Although social communication seems an easy task, media managers know how hard it is to reach millions of people with a simple message. Indeed, millions of posts are competing every day to get visibility in terms of likes and shares and very little research focused on museums communication to identify best practices. In this paper, we focus on Twitter and we propose a novel method that exploits interpretable machine learning techniques to: (a) predict whether a tweet will likely be appreciated by Twitter users or not; (b) present simple suggestions that will help enhancing the message and increasing the probability of its success. Using a real-world dataset of around 40,000 tweets written by 23 world famous museums, we show that our proposed method allows identifying tweet features that are more likely to influence the tweet success.

Code to run a selection of experiments is available at https://github.com/rmartoglia/predict-twitter-ch

Dataset structure

The dataset contains the dataset used in the experiments of the above research paper. Only the extracted features for the museum tweet threads (and not the message full text) are provided and needed for the analyses.

We selected 23 well known world spread art museums and grouped them into five groups: G1 (museums with at least three million of followers); G2 (museums with more than one million of followers); G3 (museums with more than 400,000 followers); G4 (museums with more that 200,000 followers); G5 (Italian museums). From these museums, we analyzed ca. 40,000 tweets, with a number varying from 5k ca. to 11k ca. for each museum group, depending on the number of museums in each group.

Content features: these are the features that can be drawn form the content of the tweet itself. We further divide such features in the following two categories:

– Countable: these features have a value ranging into different intervals. We take into consideration: the number of hashtags (i.e., words preceded by #) in the tweet, the number of URLs (i.e., links to external resources), the number of images (e.g., photos and graphical emoticons), the number of mentions (i.e., twitter accounts preceded by @), the length of the tweet;

– On-Off : these features have binary values in {0, 1}. We observe whether the tweet has exclamation marks, question marks, person names, place names, organization names, other names. Moreover, we also take into consideration the tweet topic density: assuming that the involved topics correspond to the hashtags mentioned in the text, we define a tweet as dense of topics if the number of hashtags it contains is greater than a given threshold, set to 5. Finally, we observe the tweet sentiment that might be present (positive or negative) or not (neutral).

Context features: these features are not drawn form the content of the tweet itself and might give a larger picture of the context in which the tweet was sent. Namely, we take into consideration the part of the day in which the tweet was sent (morning, afternoon, evening and night respectively from 5:00am to 11:59am, from 12:00pm to 5:59pm, from 6:00pm to 10:59pm and from 11pm to 4:59am), and a boolean feature indicating whether the tweet is a retweet or not.

User features: these features are proper of the user that sent the tweet, and are the same for all the tweets of this user. Namely we consider the name of the museum and the number of followers of the user.
o
Global B2B people Data | 720M+ LinkedIn Profiles | Verified & Bi-Weekly...
opendatabay.com
.undefined
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Forager (2025). Global B2B people Data | 720M+ LinkedIn Profiles | Verified & Bi-Weekly Updates [Dataset]. https://www.opendatabay.com/data/premium/5ff38f72-201c-469b-aa7c-5cba9ddb2ac3
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 5, 2025
Dataset authored and provided by
Forager
Area covered
Synthetic Data Generation
Description
🌍 Global B2B Person Dataset | 755M+ LinkedIn Profiles | Verified & Bi-Weekly Updated Access the world’s most comprehensive professional dataset, enriched with over 755 million LinkedIn profiles. The Forager.ai Global B2B Person Dataset delivers work-verified professional contacts with 95%+ accuracy, refreshed every two weeks. Ideal for recruitment, sales, research, and talent mapping, it provides direct access to decision-makers, specialists, and executives across industries and geographies.

Dataset Features Full Name & Job Title: Up-to-date first/last name with current professional role.

Emails & Phone Numbers: AI-validated work and personal email addresses, plus mobile numbers.

Company Info: Current employer name, industry, and company size (employee count).

Career History: Detailed work history with job titles, durations, and role progressions.

Skills & Endorsements: Extracted from public LinkedIn profiles.

Education & Certifications: Universities, degrees, and professional certifications.

Location & LinkedIn URL: City, country, and direct link to public LinkedIn profile.

Distribution Data Volume: 755M+ total profiles, with 270M+ containing full contact information.

Formats Available: CSV, JSON via S3 or Snowflake; API for real-time access.

Access Methods: REST API, Enrichment API (lookup), full dataset delivery, or custom solutions.

Usage This dataset is ideal for a variety of applications:

Executive Recruitment: Source passive talent, build role-based maps, and assess mobility.

Sales Intelligence: Find decision-makers, personalize outreach, and trigger campaigns on job changes.

Market Research: Understand talent concentration by company, geography, and skill set.

Partnership Development: Identify key stakeholders in target firms for business development.

Talent Mapping & Strategic Hiring: Build full organizational charts and skill distribution heatmaps.

Coverage Geographic Coverage: Global – including North America, EMEA, LATAM, and APAC.

Time Range: Continuously updated; profiles refreshed bi-weekly.

Demographics: Cross-industry coverage of seniority levels from entry-level to C-suite, across all sectors.

License CUSTOM

Who Can Use It Recruiters & Staffing Firms: For building target lists and sourcing niche talent.

Sales & RevOps Teams: For targeting by department, title, or decision-making authority.

VCs & PE Firms: To assess leadership teams and monitor executive movement.

Data Scientists & Analysts: To train models for job mobility, hiring trends, or org structure prediction.

B2B Platforms: For enriching internal databases and powering account-based marketing (ABM).
Z
ORBITAAL: cOmpRehensive BItcoin daTaset for temporAl grAph anaLysis
data.niaid.nih.gov
Updated Nov 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cazabet, Remy (2024). ORBITAAL: cOmpRehensive BItcoin daTaset for temporAl grAph anaLysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10844224
Explore at:
Dataset updated
Nov 27, 2024
Dataset provided by
Coquidé, Célestin
Cazabet, Remy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Construction

This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/

[1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain},

Dataset Description

Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021

Overview:

This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs.

Every dates have been retrieved from bloc UNIX timestamp and GMT timezone.

Contents:

The dataset is distributed across three compressed archives:

All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package.

orbitaal-stream_graph.tar.gz:

The root directory is STREAM_GRAPH/

Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes).

The stream graph is divided into 13 files, one for each year

Files format is parquet

Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering

These files are in the subdirectory STREAM_GRAPH/EDGES/

orbitaal-snapshot-all.tar.gz:

The root directory is SNAPSHOT/

Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021).

Files format is parquet

Name format is orbitaal-snapshot-all.snappy.parquet.

These files are in the subdirectory SNAPSHOT/EDGES/ALL/

orbitaal-snapshot-year.tar.gz:

The root directory is SNAPSHOT/

Contains the yearly resolution of snapshot networks

Files format is parquet

Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering

These files are in the subdirectory SNAPSHOT/EDGES/year/

orbitaal-snapshot-month.tar.gz:

The root directory is SNAPSHOT/

Contains the monthly resoluted snapshot networks

Files format is parquet

Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where

[YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering

These files are in the subdirectory SNAPSHOT/EDGES/month/

orbitaal-snapshot-day.tar.gz:

The root directory is SNAPSHOT/

Contains the daily resoluted snapshot networks

Files format is parquet

Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where

[YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering

These files are in the subdirectory SNAPSHOT/EDGES/day/

orbitaal-snapshot-hour.tar.gz:

The root directory is SNAPSHOT/

Contains the hourly resoluted snapshot networks

Files format is parquet

Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where

[YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering

These files are in the subdirectory SNAPSHOT/EDGES/hour/

orbitaal-nodetable.tar.gz:

The root directory is NODE_TABLE/

Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses.

Small samples in CSV format

orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv

These two CSV files are related to stream graph representations of an halvening happening in 2016.

orbitaal-snapshot-2016_07_08.csv and orbitaal-snapshot-2016_07_09.csv

These two CSV files are related to daily snapshot representations of an halvening happening in 2016.
A
‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-list-of-top-data-breaches-2004-2021-e7ac/746cf4e2/?iid=002-608&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘List of Top Data Breaches (2004 - 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hishaamarmghan/list-of-top-data-breaches-2004-2021 on 14 February 2022.

--- Dataset description provided by original source is as follows ---

This is a dataset containing all the major data breaches in the world from 2004 to 2021

As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.

This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?

Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches

Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning

--- Original source retains full ownership of the source dataset ---
Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction...
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li; Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li (2023). Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction Models [Dataset]. http://doi.org/10.5281/zenodo.7909511
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7909511
Dataset updated
Nov 29, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li; Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Freebase is amongst the largest public cross-domain knowledge graphs. It possesses three main data modeling idiosyncrasies. It has a strong type system; its properties are purposefully represented in reverse pairs; and it uses mediator objects to represent multiary relationships. These design choices are important in modeling the real-world. But they also pose nontrivial challenges in research of embedding models for knowledge graph completion, especially when models are developed and evaluated agnostically of these idiosyncrasies. We make available several variants of the Freebase dataset by inclusion and exclusion of these data modeling idiosyncrasies. This is the first-ever publicly available full-scale Freebase dataset that has gone through proper preparation.

Dataset Details
The dataset consists of the four variants of Freebase dataset as well as related mapping/support files. For each variant, we made three kinds of files available:
Subject matter triples file
fb+/-CVT+/-REV One folder for each variant. In each folder there are 5 files: train.txt, valid.txt, test.txt, entity2id.txt, relation2id.txt Subject matter triples are the triples belong to subject matters domains—domains describing real-world facts.
Example of a row in train.txt, valid.txt, and test.txt:
2, 192, 0
Example of a row in entity2id.txt:
/g/112yfy2xr, 2
Example of a row in relation2id.txt:
/music/album/release_type, 192
Explaination
"/g/112yfy2xr" and "/m/02lx2r" are the MID of the subject entity and object entity, respectively. "/music/album/release_type" is the realtionship between the two entities. 2, 192, and 0 are the IDs assigned by the authors to the objects.
Type system file
freebase_endtypes: Each row maps an edge type to its required subject type and object type.
Example
92, 47178872, 90
Explanation
"92" and "90" are the type id of the subject and object which has the relationship id "47178872".
Metadata files
object_types: Each row maps the MID of a Freebase object to a type it belongs to.
Example
/g/11b41c22g, /type/object/type, /people/person
Explanation
The entity with MID "/g/11b41c22g" has a type "/people/person"
object_names: Each row maps the MID of a Freebase object to its textual label.
Example
/g/11b78qtr5m, /type/object/name, "Viroliano Tries Jazz"@en
Explanation
The entity with MID "/g/11b78qtr5m" has name "Viroliano Tries Jazz" in English.
object_ids: Each row maps the MID of a Freebase object to its user-friendly identifier.
Example
/m/05v3y9r, /type/object/id, "/music/live_album/concert"
Explanation
The entity with MID "/m/05v3y9r" can be interpreted by human as a music concert live album.
domains_id_label: Each row maps the MID of a Freebase domain to its label.
Example
/m/05v4pmy, geology, 77
Explanation
The object with MID "/m/05v4pmy" in Freebase is the domain "geology", and has id "77" in our dataset.
types_id_label: Each row maps the MID of a Freebase type to its label.
Example
/m/01xljxh, /government/political_party, 147
Explanation
The object with MID "/m/01xljxh" in Freebase is the type "/government/political_party", and has id "147" in our dataset.
entities_id_label: Each row maps the MID of a Freebase entity to its label.
Example
/g/11b78qtr5m, Viroliano Tries Jazz, 2234
Explanation
The entity with MID "/g/11b78qtr5m" in Freebase is "Viroliano Tries Jazz", and has id "2234" in our dataset.
properties_id_label: Each row maps the MID of a Freebase property to its label.
Example
/m/010h8tp2, /comedy/comedy_group/members, 47178867
Explanation
The object with MID "/m/010h8tp2" in Freebase is a property(relation/edge), it has label "/comedy/comedy_group/members" and has id "47178867" in our dataset.
uri_original2simplified and uri_simplified2original: The mapping between original URI and simplified URI and the mapping between simplified URI and original URI repectively.
Example
uri_original2simplified
"http://rdf.freebase.com/ns/type.property.unique": "/type/property/unique"
uri_simplified2original
"/type/property/unique": "http://rdf.freebase.com/ns/type.property.unique"
Explanation
The URI "http://rdf.freebase.com/ns/type.property.unique" in the original Freebase RDF dataset is simplified into "/type/property/unique" in our dataset.
The identifier "/type/property/unique" in our dataset has URI http://rdf.freebase.com/ns/type.property.unique in the original Freebase RDF dataset.
A dataset related to the Batwa’s Right to Recognition as a Minority and...
figshare.com
xlsx
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ndikubwimana Jean Baptiste; Anangwe Kathleen A; Nyarwath Oriare; Mwimali Jack; Kabwete Charles (2023). A dataset related to the Batwa’s Right to Recognition as a Minority and Indigenous People in Rwanda [Dataset]. http://doi.org/10.6084/m9.figshare.24612147.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24612147.v1
Dataset updated
Nov 22, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ndikubwimana Jean Baptiste; Anangwe Kathleen A; Nyarwath Oriare; Mwimali Jack; Kabwete Charles
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Rwanda
Description
This codebook of data is related to the study conducted on the Batwa’s Rights to Recognition as a Minority and Indigenous People in Rwanda through the Lens of a Human Rights Based Approach. The dataset displays information in 7 columns. The first column is called Code level 1 which consists of the main code extracted from the findings, the second column is code level 2 which consists of sub-codes extracted from code level 1 and the third column is called code level 3 which is extracted from code level 2. The 4th column provides a snapshot of definition of the content of the codes. The column 5 concerns what the codes should include and the 6th column concerns what the codes should not include. The 7th column concerns the types of questions asked to respondents based on which codes were generated. These codes were generated following data extracted from questionnaire summarized in 7th column. For example, the first column (Code level 1) is made of 4 rows. The first two rows concern findings from the literature review and the last two rows concern empirical data from the fieldwork. Both data from literature review and empirical data from the fieldwork were combined to come up with findings based on which an interpretation was made. These codes allowed the researchers to give meaningful findings which in return facilitated researchers to provided a consolidated interpretation. The data generated aligned to epistemological interpretivism and they concern views from respondents on socio-cultural narratives and emotional experiences that the they have endured in their lives. The data collection was conducted in three rural districts of Nyaruguru (southern province), Rubavu and Rutsiro (western Province) and in three urban districts of Nyarugenge, Kicukiro and Gasabo (Kigali City). The justification for the three rural and three urban districts was to find out if there were divergent socio-cultural realities within each and across the diverse settings. The selected rural sites were those near protected areas from where the Batwa were the subjects of eviction following the legislation of protected areas in 1930 by colonial authorities. The urban districts were the sites in which some Batwa had lived after the imposition of a new lifestyle which differs from their hunting and gathering tradition following their eviction from forests. The study sites were purposively selected through the facilitation of gatekeepers namely, local entities. Authorization was sent to the district level which subsequently allowed a team of researchers to approach the sector, the cell and the village levels of administration. At the village level, which is the lowest entity where households of HMP live, respondents were again identified through the help of the Chief of the Village (umudugudu) who served as a gatekeeper.Focus Group Discussions (FGDs) along with direct observation were administered to the members of HMP (formerly referred to as Batwa). The groups comprised individuals who were above the age of 18 years, and were deemed to have experienced hardship as result of socio-economic vulnerability resulting from forest eviction. In-depth interviews were also carried out with officials of selected public institutions, including officials from the National Commission of Unity and Reconciliation and the National Commission of Human Rights. Key informants’ interviews (KIIs) were administered to leaders from NGOs and cooperative societies working towards the promotion of the rights of HMPs. These included one top manager and another who used to among the top managers of Cooperative des Potiers au Rwanda (COPORWA), a local NGO advocating for the rights of Batwa in Rwanda as well as one person who used to be among the leaders of CAURWA (Communauté des Autochtones au Rwanda, translated as Community of Autochthonies in Rwanda). The latter was also among one of the founding pioneers of a local NGO advocating for the rights of the Batwa in Rwanda. A former representative of HMP in Rwanda’s Senate was also contacted for an in-depth interview.All respondents were purposively selected due to their expertise or lived experience on the subject of self-identity and non-discrimination. Key informants from COPORWA, and a representative of the HMP in the Rwandan Senate and authorities from the government were to provide information on convergences or divergences on the phenomenon under investigation.In total, 226 respondents divided into four categories were approached for feedback. These were 220 heads of households from HMP for FGDs and direct observation; 3 leaders from COPORWA for in-depth interviews; 1 ex-Senator representing HMP in Rwanda Senate for an in-depth interview including 2 authorities from governmental institutions. The aim of using different tools for different respondents was to not only get a wide range of perceptions on the subject matter of self-identity and non-discrimination under investigation, but to enable the triangulation of information. FGDs along with direct observation facilitated the exploration of opinions and observation of behaviour and body language of the respondents when a sensitive issue, such as discrimination, was mentioned. As ethical consideration, all respondents were requested for their consent prior to data collection. All interviews were guided by the principle of ‘theoretical saturation’, which consists of administering inquiry until respondents start to repeat themselvesTo meet the reliability and validity of data, some measures were taken. Meetings were held every morning to plan for the day and every evening to evaluate the day spent in the field. For each day of data collection, the data collectors gave a daily report highlighting the progress made and any special information relating to the subject matter under investigation, which was observed from the field. The study used thematic analysis embedded in a deductive approach guided by the human rights-based approach in which two variables of self-identity and non-discrimination were the focus of study. The human rights-based approach facilitated generating data around themes related to self-identity and non-discrimination.In short, findings around the Batwa’s rights to self-identity and to non-discrimination indicated different information over the two variables. On the self-identity, findings indicated that the identity of the Batwa has been shifting because of socio-cultural dynamics affecting the contexts in which they find themselves and live. For example, the name “HMP” which conflate all vulnerable groups in Rwanda provides divergent views for respondents. For ordinary respondents from the Batwa, the name provides a negative profile while for the elites from Batwa the name means obscuring their problems since it disconnects from other indigenous people across Africa and the World. For respondents from the GoR, the name means upholding unit and reconciliation. Findings from the data indicated also that the identity Batwa has been characterised with negative profile of someone who is the poorest, dirty, indigent because of their lowest social status resulting from non-dominant context. This reality corroborates other recent studies that the identity of the Batwa does not have a fixed boundary.On the variable of non-discrimination, findings from the data indicated that negative profiles mentioned above are forms of indirect discrimination resulting from microaggressions and stereotypes. For further information how to use the dataset kindly contact the correspondent author at: ndikubwimana.genbattista@gmail.com, tel: (+250)788 751 225

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/

Geonames - All Cities with a population > 1000

Explore at:

15 scholarly articles cite this dataset (View in Google Scholar)

csv, json, geojson, excelAvailable download formats

Dataset updated

Mar 10, 2024

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

Clear search

Close search

Google apps

Main menu

Geonames - All Cities with a population > 1000

Worldwide COVID-19 Data from WHO (2025 Edition)

Dataset Overview

Source Information

Dataset Contents

How to Use

Data Reliability

Acknowledgements

Distribution of first name and last name frequencies by country

Global Country Information 2023

World Population Statistics - 2023

Content

countries of the world

Content

Acknowledgements

Inspiration

Worldwide Soundscapes project meta-data

COVID Impact Survey - Public Data

Overview

Queries

Margin of Error

About the Data

Attribution

AP Data Distributions

World Population & Health Data 2014 - 2024

GBIF Backbone Taxonomy

Worldwide Soundscapes project metadata and analysis scripts

‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ analyzed by Analyst-2

About this dataset

Overview

Findings

About This Data

Included Data

Additional Data Queries

Interactive

How to use this dataset

Acknowledgements

Start A New Notebook!

Synthetic population for USA_ALABAMA

Global News Popularity Insights Datset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Dataset for the Article "A Predictive Method to Improve the Effectiveness of...

Global B2B people Data | 720M+ LinkedIn Profiles | Verified & Bi-Weekly...

ORBITAAL: cOmpRehensive BItcoin daTaset for temporAl grAph anaLysis

‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2

Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction...

A dataset related to the Batwa’s Right to Recognition as a Minority and...

Geonames - All Cities with a population > 1000See More Versions

Geonames - All Cities with a population > 1000