70 datasets found

w
Dataset of books called People and education in the Third World
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called People and education in the Third World [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=People+and+education+in+the+Third+World
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
This dataset is about books. It has 1 row and is filtered where the book is People and education in the Third World. It features 7 columns including author, publication date, language, and book publisher.
Famous Celebrity Name Misspellings
kaggle.com
Updated Jan 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Famous Celebrity Name Misspellings [Dataset]. https://www.kaggle.com/datasets/thedevastator/famous-celebrity-name-misspellings
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Description
Famous Celebrity Name Misspellings

Aggregated data from The Gyllenhaal Experiment

By data.world's Admin [source]

About this dataset

This dataset contains aggregated spellings and mispellings of the names of 15 famous celebrities. Ever wonder if people can actually spell someone's name correctly? Now you can see for yourself with this compiled data from The Pudding's interactive spelling experiment called The Gyllenhaal Experiment! Interesting to see which names get misspelled more than others - some are easy to guess, some are surprising! With the data provided here, you can start uncovering trends in name-spelling habits. Visualize the data and start analyzing how unique or common each celebrity is with respect to spelling - who stands out? Who blends in? Check it out today and explore a side of celebrity life that hasn't been seen before!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains misnames of 15 famous celebrities. It can be used for a variety of research and analysis purposes, including exploring human language, understanding how names are misspelled, or generating data visualizations.

In order to get the most out of this dataset, you will need to familiarize yourself with its columns. The dataset consists of two columns- “data” and “updated”. The “data” column contains the misnames associated with each celebrity name. The “updated” column is automatically updated with the date on which the data was last changed or modified.

To use this dataset for your own research and analysis purposes, you may find it useful to filter out certain types of responses or patterns in order to focus more closely on particular trends or topics of interest; for example, if you are interested in exploring how spellings vary by region then you might wish to group together similar responses regardless of whether they exactly match one celebrity name over another (i.e., categorizing all spellings that follow a certain phonetic pattern). You can also separate different types of responses into separate groups in order to explore different aspects such as popularity (i.e., looking at which misspellings occurred most frequently).

Research Ideas

Creating an interactive quiz for users to test their spelling ability by challenging them to spell names correctly from the celebrity dataset.

Building a dictionary database of the misspellings, fans’ nicknames and phonetic spellings of each celebrity so that people can find more information about them more easily and accurately.

Measuring the popularity of individual celebrities by tracking the frequency in which their name is misspelled

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: data-all.csv | Column name | Description | |:--------------|:---------------------------------------------------| | data | Misspellings of celebrity names. (String) | | updated | Date when the misspelling was last updated. (Date) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit data.world's Admin.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
w
Dataset of books called Between heaven and earth : the religious worlds...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Between heaven and earth : the religious worlds people make and the scholars who study them [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Between+heaven+and+earth+%3A+the+religious+worlds+people+make+and+the+scholars+who+study+them
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Earth
Description
This dataset is about books. It has 2 rows and is filtered where the book is Between heaven and earth : the religious worlds people make and the scholars who study them. It features 7 columns including author, publication date, language, and book publisher.
d
COVID Impact Survey - Public Data
data.world
csv, zip
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2024). COVID Impact Survey - Public Data [Dataset]. https://data.world/associatedpress/covid-impact-survey-public-data
Explore at:
csv, zipAvailable download formats
Dataset updated
Oct 16, 2024
Authors
The Associated Press
Description
Overview

The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.

Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).

The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.

The survey is focused on three core areas of research:

Physical Health: Symptoms related to COVID-19, relevant existing conditions and health insurance coverage.

Economic and Financial Health: Employment, food security, and government cash assistance.

Social and Mental Health: Communication with friends and family, anxiety and volunteerism. (Questions based on those used on the U.S. Census Bureau’s Current Population Survey.) ## Using this Data - IMPORTANT This is survey data and must be properly weighted during analysis: DO NOT REPORT THIS DATA AS RAW OR AGGREGATE NUMBERS!!

Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.

Queries

If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".

Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.

Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.

The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."

Margin of Error

The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:

At least twice the margin of error, you can report there is a clear difference.

At least as large as the margin of error, you can report there is a slight or apparent difference.

Less than or equal to the margin of error, you can report that the respondents are divided or there is no difference. ## A Note on Timing Survey results will generally be posted under embargo on Tuesday evenings. The data is available for release at 1 p.m. ET Thursdays.

About the Data

The survey data will be provided under embargo in both comma-delimited and statistical formats.

Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)

Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.

Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.

Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.

Attribution

Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.

AP Data Distributions

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
w
Dataset of books called Denying democracy : how the IMF and World Bank take...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Denying democracy : how the IMF and World Bank take power from people [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Denying+democracy+%3A+how+the+IMF+and+World+Bank+take+power+from+people
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Denying democracy : how the IMF and World Bank take power from people. It features 7 columns including author, publication date, language, and book publisher.
Dataset of Burkhardt 2022 Encyclopaedia of Eponymic Plant Names
zenodo.org
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heather Lynn Lindon; Heather Lynn Lindon; Sabine von Mering; Sabine von Mering; Siobhan Leachman; Siobhan Leachman; Carmen Ulloa Ulloa; Carmen Ulloa Ulloa (2025). Dataset of Burkhardt 2022 Encyclopaedia of Eponymic Plant Names [Dataset]. http://doi.org/10.5281/zenodo.14551490
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14551490
Dataset updated
Apr 29, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Heather Lynn Lindon; Heather Lynn Lindon; Sabine von Mering; Sabine von Mering; Siobhan Leachman; Siobhan Leachman; Carmen Ulloa Ulloa; Carmen Ulloa Ulloa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author Lotte Burkhardt published in 2022 a free PDF entitled Encyclopedia of Eponymic Plant names. It consisted of two volumes, one listing all plant, algae, lichen, fossil plant, and fungal genera with the person they were named after. The other volume takes the list of people honored and lists the genera named after them. It can be found online here.

This dataset was created by Carmen Ulloa Ulloa by scraping the PDF of the A-Z names of people honored and converting it into a Google Sheet. That data were normalized with each row representing a person and the eponymic genera and the associated families split into multiple columns to make analysis easier. The data was then cleaned as the conversion from PDF was not 100% accurate with some names being split onto multiple lines, characters misread etc. The gender of the authors were annotated by the Women Plant Genera working group as part of our follow up work to a previous paper.

We have split the resulting table into three files. The first one contains the entire list of people honoured and the genera named for them. The other two are the first table split into just the flowering plant genera and the other one excludes plant genera.

Most of the women in the plants-only tab have been marked up from this project. More information could be added to the women for whom non-plant genera were named. We highly encourage anyone who is interested in an analysis of their own based on this data to do so, and get in touch with us with any questions. We anticipate that work on additional groups will deepen our understanding of the impact of the contributions women have made to botany. Our hope is that by making this dataset publically available others will explore the world of genera and eponomy, looking at interesting stories of people for whom genera were named.

The team would be greatful for any updates or corrections to this data, and we plan to publish updated versions of this dataset accordingly.
ERA5 hourly data on pressure levels from 1940 to present
cds.climate.copernicus.eu
cds-test-cci2.copernicus-climate.eu
grib
Updated Jul 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 hourly data on pressure levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.bd0915c6
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.bd0915c6
Dataset updated
Jul 14, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
Time period covered
Jan 1, 1940 - Jul 8, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on pressure levels from 1940 to present".
Data from: DOO-RE: A dataset of ambient sensors in a meeting room for...
figshare.com
zip
Updated Feb 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hyunju Kim (2024). DOO-RE: A dataset of ambient sensors in a meeting room for activity recognition [Dataset]. http://doi.org/10.6084/m9.figshare.24558619.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24558619.v3
Dataset updated
Feb 23, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Hyunju Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We release the DOO-RE dataset which consists of data streams from 11 types of various ambient sensors by collecting data 24/7 from a real-world meeting room. 4 types of ambient sensors, called environment-driven sensors, measure continuous state changes in the environment (e.g. sound), and 4 types of sensors, called user-driven sensors, capture user state changes (e.g. motion). The remaining 3 types of sensors, called actuator-driven sensors, check whether the attached actuators are active (e.g. projector on/off). The values of each sensor are automatically collected by IoT agents which are responsible for each sensor in our IoT system. A part of the collected sensor data stream representing a user activity is extracted as an activity episode in the DOO-RE dataset. Each episode's activity labels are annotated and validated by cross-checking and the consent of multiple annotators. A total of 9 activity types appear in the space: 3 based on single users and 6 based on group (i.e. 2 or more people) users. As a result, DOO-RE is constructed with 696 labeled episodes for single and group activities from the meeting room. DOO-RE is a novel dataset created in a public space that contains the properties of the real-world environment and has the potential to be good uses for developing powerful activity recognition approaches.
Worldwide Soundscapes project meta-data
zenodo.org
Updated Dec 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海李; 松海李; 黎君董; 黎君董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song (2022). Worldwide Soundscapes project meta-data [Dataset]. http://doi.org/10.5281/zenodo.7415473
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7415473
Dataset updated
Dec 9, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海李; 松海李; 黎君董; 黎君董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated soundscape datasets. This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description.

The overview of all sampling sites can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. More information on the project can be found here and on ResearchGate.

The audio recording criteria justifying inclusion into the meta-database are:

Stationary (no transects, towed sensors or microphones mounted on cars)

Passive (unattended, no human disturbance by the recordist)

Ambient (no spatial or temporal focus on a particular species or direction)

Spatially and/or temporally replicated (multiple sites sampled at least at one common daytime or multiple days sampled at least in one common site)

The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database.

datasets

dataset_id: incremental integer, primary key

name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.

subset: incremental integer that can be used to distinguish datasets with identical names

collaborators: full names of people deemed responsible for the dataset, separated by commas

contributors: full names of people who are not the main collaborators but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses, separated by commas.

date_added: when the datased was added (DD/MM/YYYY)

URL_open_recordings: if recordings (even only some) from this dataset are openly available, indicate the internet link where they can be found.

URL_project: internet link for further information about the corresponding project

DOI_publication: DOI of corresponding publications, separated by comma

core_realm_IUCN: The core realm of the dataset. Datasets may have multiple realms, but the main one should be listed. Datasets may contain sampling sites from different realms in the "sites" sheet. IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/

medium: the physical medium the microphone is situated in

protected_area: Whether the sampling sites were situated in protected areas or not, or only some.

GADM0: For datasets on land or in territorial waters, Global Administrative Database level0
https://gadm.org/

GADM1: For datasets on land or in territorial waters, Global Administrative Database level1
https://gadm.org/

GADM2: For datasets on land or in territorial waters, Global Administrative Database level2
https://gadm.org/

IHO: For marine locations, the sea area that encompassess all the sampling locations according to the International Hydrographic Organisation. Map here: https://www.arcgis.com/home/item.html?id=44e04407fbaf4d93afcb63018fbca9e2

locality: optional free text about the locality

latitude_numeric_region: study region approximate centroid latitude in WGS84 decimal degrees

longitude_numeric_region: study region approximate centroid longitude in WGS84 decimal degrees

sites_number: number of sites sampled

year_start: starting year of the sampling

year_end: ending year of the sampling

deployment_schedule: description of the sampling schedule, provisional

temporal_recording_selection: list environmental exclusion criteria that were used to determine which recording days or times to discard

high_pass_filter_Hz: frequency of the high-pass filter of the recorder, in Hz

variable_sampling_frequency: Does the sampling frequency vary? If it does, write "NA" in the sampling_frequency_kHz column and indicate it in the sampling_frequency_kHz column inside the deployments sheet

sampling_frequency_kHz: frequency the microphone was sampled at (sounds of half that frequency will be recorded)

variable_recorder:

recorder: recorder model used

microphone: microphone used

freshwater_recordist_position: position of the recordist relative to the microphone during sampling (only for freshwater)

collaborator_comments: free-text field for comments by the collaborators

validated: This cell is checked if the contents of all sheets are complete and have been found to be coherent and consistent with our requirements.

validator_name: name of person doing the validation

validation_comments: validators: please insert the date when someone was contacted

cross-check: this cell is checked if the collaborators confirm the spatial and temporal data after checking the corresponding site maps, deployment and operation time graphs found at https://drive.google.com/drive/folders/1qfwXH_7dpFCqyls-c6b8RZ_fbcn9kXbp?usp=share_link

datasets-sites

dataset_ID: primary key of datasets table

dataset_name: lookup field

site_ID: primary key of sites table

site_name: lookup field

sites

site_ID: unique site IDs, larger than 1000 for compatibility with ecoSound-web

site_name: name or code of sampling site as used in respective projects

latitude_numeric: exact numeric degrees coordinates of latitude

longitude_numeric: exact numeric degrees coordinates of longitude

topography_m: for sites on land: elevation. For marine sites: depth (negative). in meters

freshwater_depth_m

realm: Ecosystem type according to IUCN GET https://global-ecosystems.org/

biome: Ecosystem type according to IUCN GET https://global-ecosystems.org/

functional_group: Ecosystem type according to IUCN GET https://global-ecosystems.org/

comments

deployments

dataset_ID: primary key of datasets table

dataset_name: lookup field

deployment: use identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.

start_date_min: earliest date of deployment start, double-click cell to get date-picker

start_date_max: latest date of deployment start, if applicable (only used when recorders were deployed over several days), double-click cell to get date-picker

start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

permanent: is the deployment permanent (in which case it would be ongoing and the end date or duration would be unknown)?

variable_duration_days: is the duration of the deployment variable? in days

duration_days: deployment duration per recorder (use the minimum if variable)

end_date_min: earliest date of deployment end, only needed if duration is variable, double-click cell to get date-picker

end_date_max: latest date of deployment end, only needed if duration is variable, double-click cell to get date-picker

end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.

recording_time: does the recording last from the deployment start time to the end time (continuous) or at scheduled daily intervals (scheduled)? Note: we consider recordings with duty cycles to be continuous.

operation_start_time_mixed: scheduled recording start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

operation_duration_minutes: duration of operation in minutes, if constant

operation_end_time_mixed: scheduled recording end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes.

sampling_frequency_kHz: only indicate the sampling frequency if it is variable within a particular dataset so that we need to code different frequencies for different deployments

recorder

subset_sites: If the deployment was not done in all the sites of the
A
‘Austin's data portal activity metrics’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Austin's data portal activity metrics’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-austin-s-data-portal-activity-metrics-1ce3/latest
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Austin's data portal activity metrics’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/data-portal-activity-metricse on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Austin's open data portal provides lots of public data about the City of Austin. It also provides portal administrators with behind-the-scenes information about how the portal is used... but that data is mysterious, hard to handle in a spreadsheet, and not located all in one place.

Until now! Authorized city staff used admin credentials to grab this usage data and share it the public. The City of Austin wants to use this data to inform the development of its open data initiative and manage the open data portal more effectively.

This project contains related datasets for anyone to explore. These include site-level metrics, dataset-level metrics, and department information for context. A detailed detailed description of how the files were prepared (along with code) can be found on github here.

Example questions to answer about the data portal

What parts of the open data portal do people seem to value most?

What can we tell about who our users are?

How are our data publishers doing?

How much data is published programmatically vs manually?

How data is super fresh? Super stale?

Whatever you think we should know...

About the files

all_views_20161003.csv

There is a resource available to portal administrators called "Dataset of datasets". This is the export of that resource, and it was captured on Oct 3, 2016. It contains a summary of the assets available on the data portal. While this file contains over 1400 resources (such as views, charts, and binary files), only 363 are actual tabular datasets.

table_metrics_ytd.csv

This file contains information about the 363 tabular datasets on the portal. Activity metrics for an individual dataset can be accessed by calling Socrata's views/metrics API and passing along the dataset's unique ID, a time frame, and admin credentials. The process of obtaining the 363 identifiers, calling the API, and staging the information can be reviewed in the python notebook here.

site_metrics.csv

This file is the export of site-level stats that Socrata generates using a given time frame and grouping preference. This file contains records about site usage each month from Nov 2011 through Sept 2016. By the way, it contains 285 columns... and we don't know what many of them mean. But we are determined to find out!! For a preliminary exploration of the columns and what portal-related business processes to which they might relate, check out the notes in this python notebook here

city_departments_in_current_budget.csv

This file contains a list of all City of Austin departments according to how they're identified in the most recently approved budget documents. Could be helpful for getting to know more about who the publishers are.

crosswalk_to_budget_dept.csv

The City is in the process of standardizing how departments identify themselves on the data portal. In the meantime, here's a crosswalk from the department values observed in all_views_20161003.csv to the department names that appear in the City's budget

This dataset was created by Hailey Pate and contains around 100 samples along with Di Sync Success, Browser Firefox 19, technical information and other features such as: - Browser Firefox 33 - Di Sync Failed - and more.

How to use this dataset

Analyze Sf Query Error User in relation to Js Page View Admin

Study the influence of Browser Firefox 37 on Datasets Created

More datasets

Acknowledgements

If you use this dataset in your research, please credit Hailey Pate

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
o
Global Employer Dataset (Wikidata)
opendatabay.com
.undefined
Updated Jul 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Global Employer Dataset (Wikidata) [Dataset]. https://www.opendatabay.com/data/ai-ml/e31ecab8-d78b-4108-89df-7ea2d5d3e09e
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
Area covered
E-commerce & Online Transactions
Description
This dataset provides a curated and labeled subset of employer entries derived from Wikidata, with the goal of improving the quality and usability of employer data. While Wikidata is an invaluable open resource, direct use often necessitates cleaning. This dataset addresses that need by offering metadata, statistics, and labels to help users identify and utilise valid employer information. An employer is generally defined here as a company or entity that provides employment paying wages or a salary. The dataset specifically screens out entries that do not represent true employers, such as individuals or plurals. It is particularly useful for tasks involving data cleaning, entity recognition, and understanding employment nomenclature.

Columns

item_id: The unique Wikidata item identifier (QCode without the 'Q' prefix).

employer_count: The number of Wikidata entries associated with this specific employer reference.

employer: The text label of the employer's name, sourced from Kensho's English labels.

description: The accompanying description of the Wikidata employer entry, also from Kensho.

in_google_news: A binary indicator (0 for no, 1 for yes) showing if the occupation exists within the GoogleNews embedding.

language_detected: A three-digit language code, identified using FastText language detection.

source: Indicates the origin of the information, such as Wikidata or Wikipedia.

label: A binary label (0 for invalid employer, 1 for valid employer) indicating the data's quality.

labeled_by: Specifies the method used for labeling, including human, classifier_gnew, classifier_bert, or cleanlab.

label_error_reason: Provides the specific reason if a label is deemed an error, such as 'domain' or 'plural'.

Distribution

This dataset is provided as a single CSV file, named employers.wikidata.all.labeled.csv. Its current version is 1.0, with a file size of approximately 5.98 MB. The dataset contains a substantial number of entries, with item_id having 60656 values, employer having 60456 values, and description having 60640 values.

Usage

This dataset is ideal for various applications, including: * Detecting new trends in employers, occupations, and employment terminology. * Automatic error correction of employer entries. * Converting plural forms of entities to singular forms. * Training Named Entity Recognition (NER) models to identify employer names. * Building Question/Answer models that can understand and respond to queries about employers. * Improving the accuracy of FastText language detection models. * Assessing FastText accuracy with limited data.

Coverage

The dataset's coverage is global, drawing data from a Wikidata dump dated 2 February 2020. It includes employer entries from various linguistic contexts, as indicated by the language_detected column, showcasing multilingual employer names and descriptions. The content primarily focuses on entities and organisations that meet the definition of an employer, rather than specific demographic groups.

License

CC BY-SA

Who Can Use It

This dataset is suitable for: * Data scientists and machine learning engineers working on natural language processing tasks. * Researchers interested in data quality, entity resolution, and knowledge graph analysis. * Developers building applications that require accurate employer information. * Anyone needing to clean and validate employer data for various analytical or operational purposes.

Dataset Name Suggestions

Wikidata Labeled Employers

ML-Ready Wikidata Employer Data

Cleaned Wikidata Employer References

Global Employer Dataset (Wikidata)

Validated Employer Entities

Attributes

Original Data Source: ML-You-Can-Use Wikidata Employers labeled
Z
Dataset for the Article "A Predictive Method to Improve the Effectiveness of...
data.niaid.nih.gov
zenodo.org
Updated May 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riccardo Martoglia (2021). Dataset for the Article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4782983
Explore at:
Dataset updated
May 24, 2021
Dataset provided by
Marco Furini
Riccardo Martoglia
Manuela Montangero
Federica Mandreoli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset for the article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario".

Abstract:

Museums are embracing social technologies in the attempt to broaden their audience and to engage people. Although social communication seems an easy task, media managers know how hard it is to reach millions of people with a simple message. Indeed, millions of posts are competing every day to get visibility in terms of likes and shares and very little research focused on museums communication to identify best practices. In this paper, we focus on Twitter and we propose a novel method that exploits interpretable machine learning techniques to: (a) predict whether a tweet will likely be appreciated by Twitter users or not; (b) present simple suggestions that will help enhancing the message and increasing the probability of its success. Using a real-world dataset of around 40,000 tweets written by 23 world famous museums, we show that our proposed method allows identifying tweet features that are more likely to influence the tweet success.

Code to run a selection of experiments is available at https://github.com/rmartoglia/predict-twitter-ch

Dataset structure

The dataset contains the dataset used in the experiments of the above research paper. Only the extracted features for the museum tweet threads (and not the message full text) are provided and needed for the analyses.

We selected 23 well known world spread art museums and grouped them into five groups: G1 (museums with at least three million of followers); G2 (museums with more than one million of followers); G3 (museums with more than 400,000 followers); G4 (museums with more that 200,000 followers); G5 (Italian museums). From these museums, we analyzed ca. 40,000 tweets, with a number varying from 5k ca. to 11k ca. for each museum group, depending on the number of museums in each group.

Content features: these are the features that can be drawn form the content of the tweet itself. We further divide such features in the following two categories:

– Countable: these features have a value ranging into different intervals. We take into consideration: the number of hashtags (i.e., words preceded by #) in the tweet, the number of URLs (i.e., links to external resources), the number of images (e.g., photos and graphical emoticons), the number of mentions (i.e., twitter accounts preceded by @), the length of the tweet;

– On-Off : these features have binary values in {0, 1}. We observe whether the tweet has exclamation marks, question marks, person names, place names, organization names, other names. Moreover, we also take into consideration the tweet topic density: assuming that the involved topics correspond to the hashtags mentioned in the text, we define a tweet as dense of topics if the number of hashtags it contains is greater than a given threshold, set to 5. Finally, we observe the tweet sentiment that might be present (positive or negative) or not (neutral).

Context features: these features are not drawn form the content of the tweet itself and might give a larger picture of the context in which the tweet was sent. Namely, we take into consideration the part of the day in which the tweet was sent (morning, afternoon, evening and night respectively from 5:00am to 11:59am, from 12:00pm to 5:59pm, from 6:00pm to 10:59pm and from 11pm to 4:59am), and a boolean feature indicating whether the tweet is a retweet or not.

User features: these features are proper of the user that sent the tweet, and are the same for all the tweets of this user. Namely we consider the name of the museum and the number of followers of the user.
GBIF Backbone Taxonomy
gbif.org
smng.net
+3more
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GBIF Secretariat (2023). GBIF Backbone Taxonomy [Dataset]. http://doi.org/10.15468/39omei
Explore at:
Unique identifier
https://doi.org/10.15468/39omei
Dataset updated
Nov 17, 2023
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.

It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.

International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.

UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.

The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.

The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:
Catalogue of Life Checklist - 4766428 names
International Barcode of Life project (iBOL) Barcode Index Numbers (BINs) - 635951 names
UNITE - Unified system for the DNA based fungal species linked to the classification - 611208 names
The Paleobiology Database - 212054 names
World Register of Marine Species - 188857 names
The Interim Register of Marine and Nonmarine Genera - 183894 names
The World Checklist of Vascular Plants (WCVP) - 131891 names
GBIF Backbone Taxonomy - 114350 names
TAXREF - 109374 names
The Leipzig catalogue of vascular plants - 75380 names
ZooBank - 73549 names
Integrated Taxonomic Information System (ITIS) - 68377 names
Plazi.org taxonomic treatments database - 61346 names
Genome Taxonomy Database r207 - 60545 names
International Plant Names Index - 52329 names
Fauna Europaea - 45077 names
The National Checklist of Taiwan (Catalogue of Life in Taiwan, TaiCoL) - 36193 names
Dyntaxa. Svensk taxonomisk databas - 35892 names
The Plant List with literature - 32692 names
United Kingdom Species Inventory (UKSI) - 29643 names
Artsnavnebasen - 29208 names
The IUCN Red List of Threatened Species - 21221 names
Afromoths, online database of Afrotropical moth species (Lepidoptera) - 13961 names
Brazilian Flora 2020 project - Projeto Flora do Brasil 2020 - 13829 names
Prokaryotic Nomenclature Up-to-Date (PNU) - 10079 names
Checklist Dutch Species Register - Nederlands Soortenregister - 8814 names
ICTV Master Species List (MSL) - 7852 names
Cockroach Species File - 6020 names
GRIN Taxonomy - 5882 names
Taxon list of fungi and fungal-like organisms from Germany compiled by the DGfM - 4570 names
Catalogue of Afrotropical Bees - 3623 names
Catalogue of Tenebrionidae (Coleoptera) of North America - 3327 names
Checklist of Beetles (Coleoptera) of Canada and Alaska. Second Edition. - 3312 names
Systema Dipterorum - 2850 names
Catalogue of the Pterophoroidea of the World - 2807 names
The Clements Checklist - 2675 names
Taxon list of Hymenoptera from Germany compiled in the context of the GBOL project - 2496 names
IOC World Bird List, v13.2 - 2366 names
Official Lists and Indexes of Names in Zoology - 2310 names
National checklist of all species occurring in Denmark - 1922 names
Myriatrix - 1876 names
Database of Vascular Plants of Canada (VASCAN) - 1822 names
Taxon list of vascular plants from Bavaria, Germany compiled in the context of the BFL project - 1771 names
Orthoptera Species File - 1742 names
A list of the terrestrial fungi, flora and fauna of Madeira and Selvagens archipelagos - 1602 names
Aphid Species File - 1565 names
World Spider Catalog - 1561 names
Taxon list of Jurassic Pisces of the Tethys Palaeo-Environment compiled at the SNSB-JME - 1270 names
Backbone Family Classification Patch - 1143 names
GBIF Algae Classification - 1100 names
International Cichorieae Network (ICN): Cichorieae Portal - 975 names
Psocodea Species File - 803 names
New Zealand Marine Macroalgae Species Checklist - 787 names
Annotated checklist of endemic species from the Western Balkans - 754 names
Taxon list of animals with German names (worldwide) compiled at the SMNS - 503 names
Catalogue of the Alucitoidea of the World - 472 names
Lygaeoidea Species File - 462 names
Catálogo de Plantas y Líquenes de Colombia - 422 names
GBIF Backbone Patch - 317 names
Phasmida Species File - 259 names
Cortinariaceae fetched from the Index Fungorum API - 234 names
Coreoidea Species File - 233 names
GTDB supplement - 139 names
Mantodea Species File - 119 names
Endemic species in Taiwan - 93 names
Taxon list of Araneae from Germany compiled in the context of the GBOL project - 88 names
Species of Hominidae - 78 names
Taxon list of Sternorrhyncha from Germany compiled in the context of the GBOL project - 77 names
Taxon list of mosses from Germany compiled in the context of the GBOL project - 75 names
Mammal Species of the World - 73 names
Plecoptera Species File - 71 names
Species Fungorum Plus - 64 names
Catalogue of the type specimens of Cosmopterigidae (Lepidoptera: Gelechioidea) from research collections of the Zoological Institute, Russian Academy of Sciences - 47 names
Species named after famous people - 41 names
Dermaptera Species File - 36 names
Taxon list of Trichoptera from Germany compiled in the context of the GBOL project - 34 names
True Fruit Flies (Diptera, Tephritidae) of the Afrotropical Region - 33 names
Range and Regularities in the Distribution of Earthworms of the Earthworms of the USSR Fauna. Perel, 1979 - 32 names
Taxon list of Diplura from Germany compiled in the context of the GBOL project - 30 names
Lista de referencia de especies de aves de Colombia - 2022 - 24 names
Taxon list of Auchenorrhyncha from Germany compiled in the context of the GBOL project - 20 names
Catalogue of the type specimens of Polycestinae (Coleoptera: Buprestidae) from research collections of the Zoological Institute, Russian Academy of Sciences - 19 names
Taxon list of Thysanoptera from Germany compiled in the context of the GBOL project - 19 names
Lista de especies de vertebrados registrados en jurisdicción del Departamento del Huila - 18 names
Taxon list of Microcoryphia (Archaeognatha) from Germany compiled in the context of the GBOL project - 15 names
Catalogue of the type specimens of Bufonidae and Megophryidae (Amphibia: Anura) from research collections of the Zoological Institute, Russian Academy of Sciences - 12 names
Grylloblattodea Species File - 11 names
Coleorrhyncha Species File - 9 names
Taxon list of liverworts from Germany compiled in the context of the GBOL project - 9 names
Embioptera Species File - 7 names
Taxon list of Pisces and Cyclostoma from Germany compiled in the context of the GBOL project - 6 names
Taxon list of Pteridophyta from Germany compiled in the context of the GBOL project - 6 names
Taxon list of Siphonaptera from Germany compiled in the context of the GBOL project - 5 names
The Earthworms of the Fauna of Russia. Perel, 1997 - 5 names
Taxon list of Zygentoma from Germany compiled in the context of the GBOL project - 4 names
Asiloid Flies: new taxa of Diptera: Apioceridae, Asilidae, and Mydidae - 3 names
Taxon list of Protura from Germany compiled in the context of the GBOL project - 3 names
Taxon list of hornworts from Germany compiled in the context of the GBOL project - 2 names
Chrysididae Species File - 1 names
Taxon list of Dermaptera from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Diplopoda from Germany in the context of the GBOL project - 1 names
Taxon list of Orthoptera (Grashoppers) from Germany compiled at the SNSB - 1 names
Taxon list of Pscoptera from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Pseudoscorpiones from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Raphidioptera from Germany compiled in the context of the GBOL project - 1 names
World Population Statistics - 2023
kaggle.com
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavik Jikadara (2024). World Population Statistics - 2023 [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/world-population-statistics-2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bhavik Jikadara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
The current US Census Bureau world population estimate in June 2019 shows that the current global population is 7,577,130,400 people on Earth, which far exceeds the world population of 7.2 billion in 2015. Our estimate based on UN data shows the world's population surpassing 7.7 billion.

China is the most populous country in the world with a population exceeding 1.4 billion. It is one of just two countries with a population of more than 1 billion, with India being the second. As of 2018, India has a population of over 1.355 billion people, and its population growth is expected to continue through at least 2050. By the year 2030, India is expected to become the most populous country in the world. This is because India’s population will grow, while China is projected to see a loss in population.

The following 11 countries that are the most populous in the world each have populations exceeding 100 million. These include the United States, Indonesia, Brazil, Pakistan, Nigeria, Bangladesh, Russia, Mexico, Japan, Ethiopia, and the Philippines. Of these nations, all are expected to continue to grow except Russia and Japan, which will see their populations drop by 2030 before falling again significantly by 2050.

Many other nations have populations of at least one million, while there are also countries that have just thousands. The smallest population in the world can be found in Vatican City, where only 801 people reside.

In 2018, the world’s population growth rate was 1.12%. Every five years since the 1970s, the population growth rate has continued to fall. The world’s population is expected to continue to grow larger but at a much slower pace. By 2030, the population will exceed 8 billion. In 2040, this number will grow to more than 9 billion. In 2055, the number will rise to over 10 billion, and another billion people won’t be added until near the end of the century. The current annual population growth estimates from the United Nations are in the millions - estimating that over 80 million new lives are added yearly.

This population growth will be significantly impacted by nine specific countries which are situated to contribute to the population growth more quickly than other nations. These nations include the Democratic Republic of the Congo, Ethiopia, India, Indonesia, Nigeria, Pakistan, Uganda, the United Republic of Tanzania, and the United States of America. Particularly of interest, India is on track to overtake China's position as the most populous country by 2030. Additionally, multiple nations within Africa are expected to double their populations before fertility rates begin to slow entirely.

Content

In this Dataset, we have Historical Population data for every Country/Territory in the world by different parameters like Area Size of the Country/Territory, Name of the Continent, Name of the Capital, Density, Population Growth Rate, Ranking based on Population, World Population Percentage, etc. >Dataset Glossary (Column-Wise):

Rank: Rank by Population.

CCA3: 3 Digit Country/Territories Code.

Country/Territories: Name of the Country/Territories.

Capital: Name of the Capital.

Continent: Name of the Continent.

2022 Population: Population of the Country/Territories in the year 2022.

2020 Population: Population of the Country/Territories in the year 2020.

2015 Population: Population of the Country/Territories in the year 2015.

2010 Population: Population of the Country/Territories in the year 2010.

2000 Population: Population of the Country/Territories in the year 2000.

1990 Population: Population of the Country/Territories in the year 1990.

1980 Population: Population of the Country/Territories in the year 1980.

1970 Population: Population of the Country/Territories in the year 1970.

Area (km²): Area size of the Country/Territories in square kilometers.

Density (per km²): Population Density per square kilometer.

Growth Rate: Population Growth Rate by Country/Territories.

World Population Percentage: The population percentage by each Country/Territories.
People Data Labs Company Dataset
datarade.ai
.json, .csv
Updated Oct 18, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
People Data Labs (2021). People Data Labs Company Dataset [Dataset]. https://datarade.ai/data-products/people-data-labs-company-dataset-people-data-labs
Explore at:
.json, .csvAvailable download formats
Dataset updated
Oct 18, 2021
Dataset provided by
People Data Labs Inc.
Authors
People Data Labs
Area covered
Tokelau, South Sudan, Martinique, Dominican Republic, Antarctica, Paraguay, Christmas Island, Romania, Slovenia, Barbados
Description
People Data Labs is an aggregator of B2B person and company data. We source our globally compliant person dataset via our "Data Union".

The "Data Union" is our proprietary data sharing co-op. Customers opt-in to sharing their data and warrant that their data is fully compliant with global data privacy regulations. Some data sources are provided as a one time dump, others are refreshed every time we do a new data build. Our data sources come from a variety of verticals including HR Tech, Real Estate Tech, Identity/Anti-Fraud, Martech, and others. People Data Labs works with customers on compliance based topics. If a customer wishes to ensure anonymity, we work with them to anonymize the data.

Our company data has identifying information (name, website, social profiles), company attributes (industry, size, founded date), and tags + free text that is useful for segmentation.
Global Sanctions Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). Global Sanctions Dataset [Dataset]. https://brightdata.com/products/datasets/global-sanctions
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jan 15, 2025
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
With in-depth information on individuals who have been included in the international sanctions list and are currently facing economic sanctions from various countries and international organizations, you can benefit greatly. Our list includes key data attributes such as - first name, last name, citizenship, passport details, address, date of proscription & reason for listing. The comprehensive information on individuals listed on the international sanctions list helps organizations ensure compliance with sanctions regulations and avoid any potential risks associated with doing business with sanctioned entities.

Popular attributes:

✔ Financial Intelligence

✔ Credit Risk Analysis

✔ Compliance

✔ Bank Data Enrichment

✔ Account Profiling
w
Dataset of books called Baptists through the centuries : a history of a...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Baptists through the centuries : a history of a global people [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Baptists+through+the+centuries+%3A+a+history+of+a+global+people
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 2 rows and is filtered where the book is Baptists through the centuries : a history of a global people. It features 7 columns including author, publication date, language, and book publisher.
Worldwide Soundscapes project metadata and analysis scripts
zenodo.org
data.niaid.nih.gov
csv, zip
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; Songhai Li; Songhai Li; Lijun Dong; Lijun Dong; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger (2025). Worldwide Soundscapes project metadata and analysis scripts [Dataset]. http://doi.org/10.5281/zenodo.14216871
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14216871
Dataset updated
May 7, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; Songhai Li; Songhai Li; Lijun Dong; Lijun Dong; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated passive acoustic monitoring meta-datasets (i.e. meta-data collections). This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description. Additionally, R scripts are provided to replicate the analysis published in [placeholder].

The overview of all sampling sites and timelines can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. The recordings of this collection were annotated and analysed to explore macro-ecological trends.

The audio recording criteria justifying inclusion into the meta-database are:

Stationary (no transects, towed sensors or microphones mounted on cars)

Passive (unattended, no human disturbance by the recordist)

Ambient (no directional microphone or triggered recordings, non-experimental conditions)

Spatially and/or temporally replicated (i.e. multiple sites sampled at the same time and/or multiple days - covering the same daytime - sampled at the same site)

The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database. The data shared here only includes validated collections.

Changes from version 4.0.0

Added link to the published synthesis.

Meta-database CSV files

collections

collection_id: unique integer, primary key

name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.

ecoSound-web_link: link of validated meta-collection on ecoSound-web

primary_contributors: full names of people deemed corresponding contributors who are responsible for the dataset

secondary_contributors: full names of people who are not primary contributors but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses

date_added: when the datased was added (YYYY-MM-DD)

URL_open_recordings: internet link for openly-available recordings from this collection

URL_project: internet link for further information about the corresponding project

DOI_publication: Digital Object Identifiers of corresponding publications

core_realm_IUCN: The main, core realm of the dataset according to IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/

medium: the physical medium the microphone is situated in

locality: optional free text about the locality

contributor_comments: free-text field for comments by the primary contributors

collections-sites

dataset_ID: primary key of collections table

site_ID: primary key of sites table

sites

site_ID: unique integer, primary key

site_name: internal name or code of sampling site as used in respective projects

latitude_numeric: site's numeric degrees of latitude

longitude_numeric: site's numeric degrees of longitude

blurred_coordinates: whether latitude and longitude coordinates are inaccurate, boolean. Coordinates may be blurred with random offsets, rounding, snapping, etc. Indicate the blurring method inside the comments field

topography_m: vertical position of the microphone relative to the sea level. for sites on land: elevation. For marine sites: depth (negative). in meters. Only indicate if the values were measured by the collaborator.

freshwater_depth_m: microphone depth, only used for sites inside freshwater bodies that also have an elevation value above the sea level

realm: Ecosystem type: main realm according to IUCN GET https://global-ecosystems.org/

biome: Ecosystem type: main biome according to IUCN GET https://global-ecosystems.org/

functional_group: Ecosystem type: main functional group according to IUCN GET https://global-ecosystems.org/

contributor_comments: free text field for contributor comments

GADM_0: Global ADMinistrative Database level 0 classification of terrestrial site or marine site that is within territorial waters. Source: https://gadm.org/download_world.html

IHO: International Hydrographic Organization classification of marine site. Source: https://marineregions.org/downloads.php

WDPA: World Database on Protected Areas classification of the site. Source: https://www.protectedplanet.net/en/thematic-areas/wdpa?tab=WDPA

deployments

dataset_ID: primary key of datasets table

deployment: identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.

subset_site_ID: If the deployment was not done in all the sites of the corresponding collection, site IDs where the deployment was conducted

start_date: date of deployment start

start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

permanent: whether the deployment is permanent, boolean

end_date: date of deployment end (date when last scheduled operation starts)

end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.

operation_mode: continuous: recording takes place from the deployment start date-time to deployment end date-time.
periodical: recording takes place periodically (i.e., with duty cycle) from the deployment start date-time to deployment end date-time.
scheduled: recording takes place during scheduled daily time intervals (optionally with duty cycle)

duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". empty if no duty cycle is used. For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes

operation_start_time_mixed: only for scheduled recordings: start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

operation_duration_minutes: only for scheduled recordings: duration of operation in minutes, if constant

operation_end_time_mixed: only for scheduled recordings: end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Only required if durations are variable. Do not use when end times are ambiguous (for instance, if a recording could be 1 hour or 25 hours long because the end is on the next day). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

high_pass_filter_Hz: frequency of the high-pass filter of the recorder if applied, in Hz. Otherwise, write "none". This may be called a "low-cut" filter too.

bit_depth: sampling bit depth of the recordings. Often constant for a particular recorder

channels: number of recorded audio channels

sampling_frequency_kHz: frequency at which the microphone signal was sampled by the recorder (sounds of half that frequency will be recorded)

recorder: recorder used for deployment

microphone: microphone used for deployment

target_taxa: main IUCN animal taxa that were studied with this deployment, using the exact IUCN Red list names (http://www.iucnredlist.org/), separated by commas. Only genera, families, orders, and classes are accepted. Empty if there was no taxonomic focus (i.e., general soundscapes were the study focus).

contributor_comments: free text field for contributor comments

exact_recordings: whether the deployment data here have been superseded by inserting more exact recording date-time ranges into the meta-collection on ecoSound-web

recordings (partial download from ecoSound-web)

recording_id: primary key of the recordings table

collection_id: ID of the collection the recording belongs to

name: name of the recording

site_id: site ID the recording belongs to:

recorder_id: ID of the recorder used for the recording (internal ecoSound-web code)

microphone_id: ID of the microphone used for the recording (internal ecoSound-web code)

recording_gain:recording gain applied for amplifying the audio signal, in decibels

duty_cycle_recording: fraction of the recording periode when the recorder is actively recording audio

duty_cycle_period: period of the duty cycle, i.e., time between the starts of two subsequent recordings

note: comments (contains the target taxon)

file_date: date of the recording
P
TrajNet Dataset
paperswithcode.com
Updated Aug 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Becker; Ronny Hug; Wolfgang Hübner; Michael Arens (2021). TrajNet Dataset [Dataset]. https://paperswithcode.com/dataset/trajnet-1
Explore at:
Dataset updated
Aug 23, 2021
Authors
Stefan Becker; Ronny Hug; Wolfgang Hübner; Michael Arens
Description
The TrajNet Challenge represents a large multi-scenario forecasting benchmark. The challenge consists on predicting 3161 human trajectories, observing for each trajectory 8 consecutive ground-truth values (3.2 seconds) i.e., t−7,t−6,…,t, in world plane coordinates (the so-called world plane Human-Human protocol) and forecasting the following 12 (4.8 seconds), i.e., t+1,…,t+12. The 8-12-value protocol is consistent with the most trajectory forecasting approaches, usually focused on the 5-dataset ETH-univ + ETH-hotel + UCY-zara01 + UCY-zara02 + UCY-univ. Trajnet extends substantially the 5-dataset scenario by diversifying the training data, thus stressing the flexibility and generalization one approach has to exhibit when it comes to unseen scenery/situations. In fact, TrajNet is a superset of diverse datasets that requires to train on four families of trajectories, namely 1) BIWI Hotel (orthogonal bird’s eye flight view, moving people), 2) Crowds UCY (3 datasets, tilted bird’s eye view, camera mounted on building or utility poles, moving people), 3) MOT PETS (multisensor, different human activities) and 4) Stanford Drone Dataset (8 scenes, high orthogonal bird’s eye flight view, different agents as people, cars etc. ), for a total of 11448 trajectories. Testing is requested on diverse partitions of BIWI Hotel, Crowds UCY, Stanford Drone Dataset, and is evaluated by a specific server (ground-truth testing data is unavailable for applicants).

Facebook

Twitter

Click to copy link

Link copied

Cite

Work With Data (2025). Dataset of books called People and education in the Third World [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=People+and+education+in+the+Third+World

Dataset of books called People and education in the Third World

Explore at:

Dataset updated

Apr 17, 2025

Dataset authored and provided by

Work With Data

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

World

Description

This dataset is about books. It has 1 row and is filtered where the book is People and education in the Third World. It features 7 columns including author, publication date, language, and book publisher.

Clear search

Close search

Google apps

Main menu

Dataset of books called People and education in the Third World

Famous Celebrity Name Misspellings

Famous Celebrity Name Misspellings

Aggregated data from The Gyllenhaal Experiment

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Geonames - All Cities with a population > 1000

Dataset of books called Between heaven and earth : the religious worlds...

COVID Impact Survey - Public Data

Overview

Queries

Margin of Error

About the Data

Attribution

AP Data Distributions

Dataset of books called Denying democracy : how the IMF and World Bank take...

Dataset of Burkhardt 2022 Encyclopaedia of Eponymic Plant Names

ERA5 hourly data on pressure levels from 1940 to present

Data from: DOO-RE: A dataset of ambient sensors in a meeting room for...

Worldwide Soundscapes project meta-data

‘Austin's data portal activity metrics’ analyzed by Analyst-2

About this dataset

Background

Example questions to answer about the data portal

About the files

all_views_20161003.csv

table_metrics_ytd.csv

site_metrics.csv

city_departments_in_current_budget.csv

crosswalk_to_budget_dept.csv

How to use this dataset

Acknowledgements

Start A New Notebook!

Global Employer Dataset (Wikidata)

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Dataset for the Article "A Predictive Method to Improve the Effectiveness of...

GBIF Backbone Taxonomy

World Population Statistics - 2023

Content

People Data Labs Company Dataset

Global Sanctions Dataset

Dataset of books called Baptists through the centuries : a history of a...

Worldwide Soundscapes project metadata and analysis scripts

TrajNet Dataset

Dataset of books called People and education in the Third World

`all_views_20161003.csv`

`table_metrics_ytd.csv`

`site_metrics.csv`

`city_departments_in_current_budget.csv`

`crosswalk_to_budget_dept.csv`