100+ datasets found

Baby Names by Year
kaggle.com
Updated Sep 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Baby Names by Year [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-baby-names-by-year-of-birth/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About this dataset

This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time

How to use the dataset

How to use the US Baby Names by Year of Birth dataset:

This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.

This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years

Research Ideas

This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years

Columns

index: the index of the dataframe

YearOfBirth: the year in which the baby was born

Name: the name of the baby

Sex: the sex of the baby

Number: the number of babies with that name and sex

Acknowledgements

If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov

Data Source
Gender by Name (Time-series)
kaggle.com
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Gender by Name (Time-series) [Dataset]. https://www.kaggle.com/datasets/thedevastator/automated-gender-identification-using-name-proba/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Description
Automated Gender Identification Using Name Probabilities

2019 US Social Security Administration Data

By Derek Howard [source]

About this dataset

This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.

To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).

In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
Good luck!

Research Ideas

Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.

Generate gender neutral names - use this data to generate random names with no gender bias.

Automate record lookup - quickly and accurately assign genders based on the probability associated with their name

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.
Z
Global Country Information 2023
data.niaid.nih.gov
zenodo.org
Updated Jun 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elgiriyewithana, Nidula (2024). Global Country Information 2023 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8165228
Explore at:
Dataset updated
Jun 15, 2024
Dataset authored and provided by
Elgiriyewithana, Nidula
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

Key Features

Country: Name of the country.

Density (P/Km2): Population density measured in persons per square kilometer.

Abbreviation: Abbreviation or code representing the country.

Agricultural Land (%): Percentage of land area used for agricultural purposes.

Land Area (Km2): Total land area of the country in square kilometers.

Armed Forces Size: Size of the armed forces in the country.

Birth Rate: Number of births per 1,000 population per year.

Calling Code: International calling code for the country.

Capital/Major City: Name of the capital or major city.

CO2 Emissions: Carbon dioxide emissions in tons.

CPI: Consumer Price Index, a measure of inflation and purchasing power.

CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.

Currency_Code: Currency code used in the country.

Fertility Rate: Average number of children born to a woman during her lifetime.

Forested Area (%): Percentage of land area covered by forests.

Gasoline_Price: Price of gasoline per liter in local currency.

GDP: Gross Domestic Product, the total value of goods and services produced in the country.

Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.

Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.

Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.

Largest City: Name of the country's largest city.

Life Expectancy: Average number of years a newborn is expected to live.

Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.

Minimum Wage: Minimum wage level in local currency.

Official Language: Official language(s) spoken in the country.

Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.

Physicians per Thousand: Number of physicians per thousand people.

Population: Total population of the country.

Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.

Tax Revenue (%): Tax revenue as a percentage of GDP.

Total Tax Rate: Overall tax burden as a percentage of commercial profits.

Unemployment Rate: Percentage of the labor force that is unemployed.

Urban Population: Percentage of the population living in urban areas.

Latitude: Latitude coordinate of the country's location.

Longitude: Longitude coordinate of the country's location.

Potential Use Cases

Analyze population density and land area to study spatial distribution patterns.

Investigate the relationship between agricultural land and food security.

Examine carbon dioxide emissions and their impact on climate change.

Explore correlations between economic indicators such as GDP and various socio-economic factors.

Investigate educational enrollment rates and their implications for human capital development.

Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.

Study labor market dynamics through indicators such as labor force participation and unemployment rates.

Investigate the role of taxation and its impact on economic development.

Explore urbanization trends and their social and environmental consequences.
w
COVID-19 High Frequency Phone Survey of Households 2020 - World Bank LSMS...
microdata.worldbank.org
catalog.ihsn.org
Updated Oct 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistics Agency of Ethiopia (2021). COVID-19 High Frequency Phone Survey of Households 2020 - World Bank LSMS Harmonized Dataset - Ethiopia [Dataset]. https://microdata.worldbank.org/index.php/catalog/4072
Explore at:
Dataset updated
Oct 25, 2021
Dataset authored and provided by
Central Statistics Agency of Ethiopia
Time period covered
2018 - 2021
Area covered
Ethiopia
Description
Abstract

To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.

The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.

Two harmonized datafiles are prepared for each survey. The two datafiles are: 1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales. 2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.

Geographic coverage

National coverage

Analysis unit

Households

Individuals

Universe

The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

Kind of data

Sample survey data [ssd]

Sampling procedure

See “Ethiopia - Socioeconomic Survey 2018-2019” and “Ethiopia - COVID-19 High Frequency Phone Survey of Households 2020” available in the Microdata Library for details.

Mode of data collection

Computer Assisted Personal Interview [capi]

Cleaning operations

Ethiopia Socioeconomic Survey (ESS) 2018-2019 and Ethiopia COVID-19 High Frequency Phone Survey of Households (HFPS) 2020 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).

The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.

Response rate

See “Ethiopia - Socioeconomic Survey 2018-2019” and “Ethiopia - COVID-19 High Frequency Phone Survey of Households 2020” available in the Microdata Library for details.
d
Johns Hopkins COVID-19 Case Tracker
data.world
csv, zip
Updated Sep 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
Explore at:
zip, csvAvailable download formats
Dataset updated
Sep 26, 2025
Authors
The Associated Press
Time period covered
Jan 22, 2020 - Mar 9, 2023
Area covered
Description
Updates

Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

CDC Weekly case and death counts (national and state level)

CDC County level cases and deaths

HHS New hospital admissions

CDC NowCast COVID variant proportions (national and regional level)

April 9, 2020

The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.

April 20, 2020

Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.

April 29, 2020

The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.

September 1st, 2020

Johns Hopkins is now providing counts for the five New York City counties individually.

February 12, 2021

The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."

Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.

February 16, 2021

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

The AP is updating this dataset hourly at 45 minutes past the hour.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

Queries

Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

Filter cases by state here

Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac

Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true

Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.

Pull the 100 counties with the highest per-capita confirmed cases here

Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.

Interactive

The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

@(https://datawrapper.dwcdn.net/nRyaf/15/)

Interactive Embed Code

<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>

Caveats

This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.

In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.

In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"

This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.

Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.

The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

Attribution

This data should be credited to Johns Hopkins University COVID-19 tracking project
Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...
datarade.ai
Updated Oct 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2024). Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat files available – 170M+, Verified Profiles - Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-us-premium-b2b-emails-phone-numbers-dataset-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Oct 12, 2024
Dataset provided by
Area covered
United States
Description
Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.

Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.

API Features:

Real-Time Updates: Our APIs deliver real-time updates, ensuring that the contact data your business relies on is always current and accurate.

High Volume Handling: Designed to support up to 860k API calls per day, our system is built for scalability and responsiveness, catering to enterprises of all sizes.

Flexible Integration: Easily integrate with CRM systems, marketing automation tools, and other enterprise applications to streamline your workflows and enhance productivity.

Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.

Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.

Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.

Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.

Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.

Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...
World cities database
kaggle.com
Updated May 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juanma Hernández (2025). World cities database [Dataset]. http://doi.org/10.34740/kaggle/dsv/11944536
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11944536
Dataset updated
May 25, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Juanma Hernández
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data is from:

https://simplemaps.com/data/world-cities

We're proud to offer a simple, accurate and up-to-date database of the world's cities and towns. We've built it from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA.

Our database is:

Up-to-date: It was last refreshed on May 11, 2025.

Comprehensive: Over 4 million unique cities and towns from every country in the world (about 48 thousand in basic database).

Accurate: Cleaned and aggregated from official sources. Includes latitude and longitude coordinates.

Simple: A single CSV file, concise field names, only one entry per city.
d
CompanyData.com (BoldData) - Company Dataset of 6M IT companies worldwide
datarade.ai
Updated Apr 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CompanyData.com (BoldData) (2021). CompanyData.com (BoldData) - Company Dataset of 6M IT companies worldwide [Dataset]. https://datarade.ai/data-products/list-of-6m-it-companies-worldwide-bolddata
Explore at:
.json, .csv, .xls, .txtAvailable download formats
Dataset updated
Apr 27, 2021
Dataset authored and provided by
CompanyData.com (BoldData)
Area covered
Libya, British Indian Ocean Territory, Maldives, Swaziland, New Zealand, Korea (Democratic People's Republic of), Algeria, Turks and Caicos Islands, Uruguay, Taiwan
Description
At CompanyData.com (BoldData), we provide verified company data sourced directly from official trade registers. Our global IT company dataset gives you access to 6 million IT businesses worldwide, including software firms, tech consultancies, system integrators, SaaS providers, and other IT service companies. Every record is sourced from authoritative local registries, ensuring unmatched accuracy, coverage, and compliance.

This dataset is built for professionals who need reliable, structured insights into the global technology sector. Each company profile includes firmographic details such as legal entity name, registration number, business structure, size, revenue range, and industry classification (NACE/SIC). In addition, you'll find direct contact information for decision-makers—emails, mobile numbers, job titles, and department roles—helping you connect with the right people instantly.

Whether you're validating suppliers for compliance, identifying high-potential leads for sales, enriching your CRM data, or building AI models with clean and segmented business intelligence, our IT dataset is designed to support a wide range of critical use cases. From global enterprises to fast-scaling startups, our data empowers businesses to move faster and smarter.

We offer multiple delivery methods tailored to your needs. Choose from custom bulk files, access data through our self-service platform, integrate it directly into your systems via real-time API, or let us enrich your existing database with missing fields and decision-maker insights.

With a database spanning 380 million companies globally, deep IT sector segmentation, and proven expertise in sourcing from local trade registers, CompanyData.com (BoldData) helps your team identify opportunities, ensure compliance, and scale efficiently—wherever your growth takes you.
ReCANVo: A Dataset of Real-World Communicative and Affective Nonverbal...
zenodo.org
data.niaid.nih.gov
zip
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jaya Narain; Jaya Narain; Kristina Teresa Johnson; Kristina Teresa Johnson (2024). ReCANVo: A Dataset of Real-World Communicative and Affective Nonverbal Vocalizations [Dataset]. http://doi.org/10.5281/zenodo.5786860
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5786860
Dataset updated
Aug 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jaya Narain; Jaya Narain; Kristina Teresa Johnson; Kristina Teresa Johnson
Description
A dataset of 7077 labeled vocalizations made by non-speaking individuals. Each vocalization lasts approximately 0.5-4 seconds and is labeled with its affective or communicative meaning. Data were acquired in real-world settings (homes, schools, etc.) and were labeled in real-time by parents or caregivers who knew the non-speaking communicator well.

dataset_file_directory.csv provides the name of each vocalization file, the corresponding participant ID, and the vocalization meaning or label (delighted, frustrated, request, etc.).

If you use this dataset, please cite Johnson & Narain et al., "ReCANVo: A Database of Real-World Communicative and Affective Nonverbal Vocalizations". The authors are Jaya Narain, Kristina T. Johnson, Thomas Quatieri, Pattie Maes, and Rosalind Picard. This paper provides more information about the dataset, including data acquisition methodology, pre-processing procedures, and participant demographics.

**J.N. and K.T.J. are joint first authors on this project. Please include both names in attribution when possible (e.g., Johnson & Narain et al.).
Data from: Global Impacts Dataset of Invasive Alien Species (GIDIAS)
springernature.figshare.com
xlsx
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sven Bacher; Ellen Ryan-Colton; Mario Coiro; Phillip Cassey; Bella S. Galil; Martín A. Nuñez; Michael Ansong; Katharina Dehnen-Schmutz; Georgi Fayvush; Romina Daiana Fernandez; Ankila Hiremath; Makihiko Ikegami; Angeliki F. Martinou; Shana M. McDermott; Cristina Preda; Montserrat Vilà; Olaf L. F. Weyl; Neelavara Ananthram Aravind; Katerina Athanasiou; Vidyadhar Atkore; Jacob N. Barney; Tim M. Blackburn; Eckehard G. Brockerhoff; Clinton Carbutt; Luca Carisio; Vanessa Céspedes; Diego F. Cisneros-Heredia; Meghan Cooling; Maarten de Groot; Jakovos Demetriou; James W. E. Dickey; Regan Early; Thomas G. Evans; Belinda Gallardo; Monica Gruber; Cang Hui; Jonathan Jeschke; Natalia Z. Joelson; Mohd Asgar Khan; Sabrina Kumschick; Lori Lach; Katharina Lapin; Simone Lioy; Chunlong Liu; Zoe J. MacMullen; Manuela A. Mazzitelli; G. John Measey; Agata A. Mrugała-Koese; Camille L. Musseau; Helen F. Nahrung; Alessia lucia Pepori; Luis R. Pertierra; Elizabeth F. Pienaar; Petr Pyšek; Gonzalo Rivas-Torres; Henry A. Rojas Martinez; JULISSA ROJAS-SANDOVAL; Ned Ryan-Schofield; Rocío M. Sánchez; Alberto Santini; Davide Santoro; Riccardo Scalera; Lisanna Schmidt; Tinyiko Cavin Shivambu; Sima Sohrabi; Elena Tricarico; Alejandro Trillo; Pieter G. van't Hof; Lara Volery; Tsungai A. Zengeya; Aikaterini Christopoulou; Virginia G. Duboscq-Carra; Ioanna A. Angelidou; Pilar Castro-Díez; Paola Tatiana Flores Males (2025). Global Impacts Dataset of Invasive Alien Species (GIDIAS) [Dataset]. http://doi.org/10.6084/m9.figshare.27908838.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27908838.v1
Dataset updated
May 21, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Sven Bacher; Ellen Ryan-Colton; Mario Coiro; Phillip Cassey; Bella S. Galil; Martín A. Nuñez; Michael Ansong; Katharina Dehnen-Schmutz; Georgi Fayvush; Romina Daiana Fernandez; Ankila Hiremath; Makihiko Ikegami; Angeliki F. Martinou; Shana M. McDermott; Cristina Preda; Montserrat Vilà; Olaf L. F. Weyl; Neelavara Ananthram Aravind; Katerina Athanasiou; Vidyadhar Atkore; Jacob N. Barney; Tim M. Blackburn; Eckehard G. Brockerhoff; Clinton Carbutt; Luca Carisio; Vanessa Céspedes; Diego F. Cisneros-Heredia; Meghan Cooling; Maarten de Groot; Jakovos Demetriou; James W. E. Dickey; Regan Early; Thomas G. Evans; Belinda Gallardo; Monica Gruber; Cang Hui; Jonathan Jeschke; Natalia Z. Joelson; Mohd Asgar Khan; Sabrina Kumschick; Lori Lach; Katharina Lapin; Simone Lioy; Chunlong Liu; Zoe J. MacMullen; Manuela A. Mazzitelli; G. John Measey; Agata A. Mrugała-Koese; Camille L. Musseau; Helen F. Nahrung; Alessia lucia Pepori; Luis R. Pertierra; Elizabeth F. Pienaar; Petr Pyšek; Gonzalo Rivas-Torres; Henry A. Rojas Martinez; JULISSA ROJAS-SANDOVAL; Ned Ryan-Schofield; Rocío M. Sánchez; Alberto Santini; Davide Santoro; Riccardo Scalera; Lisanna Schmidt; Tinyiko Cavin Shivambu; Sima Sohrabi; Elena Tricarico; Alejandro Trillo; Pieter G. van't Hof; Lara Volery; Tsungai A. Zengeya; Aikaterini Christopoulou; Virginia G. Duboscq-Carra; Ioanna A. Angelidou; Pilar Castro-Díez; Paola Tatiana Flores Males
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present the Global Impacts Dataset of Invasive Alien Species (GIDIAS), a global dataset of 22865 records including impacts of invasive alien species on nature, nature’s contributions to people, and good quality of life. Records include positive and negative impacts, neutral impacts (studies were carried out, but no impacts were documented), non-directional impacts (i.e., change without detriments or benefits for native species or people), and finally, some records of alien species where no studies were found that assessed their impacts (indicating data gaps). Records cover 3353 invasive alien species from all major taxa (plants, vertebrates, invertebrates, microorganisms) and all continents and realms (terrestrial, freshwater, marine). The data were compiled to serve as robust evidence for chapter 4 “Impacts of invasive alien species on nature, nature's contributions to people, and good quality of life” of the global assessment report on invasive alien species by the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES; available on Zenodo at https://doi.org/10.5281/zenodo.7430731). The dataset is provided in a machine-readable CSV file (file name GIDIAS_20250417_machine_read.csv), with special language characters retained where used (UTF-8 format). The dataset is also provided in Excel format (file name GIDIAS_20250417_Excel.xlsx). Metadata is provided in Excel format, including descriptors for each variable (file name GIDIAS_metadata_20250417.xlsx). Additional explanations for GIDIAS is stored in Microsoft Word format (docx) and contains (1) a short description of the principles of Environmental and Socio-Economic Impact Classification for Alien Taxa (EICAT, SEICAT), (2) a description of the variables included in the Global Impacts Dataset of Invasive Alien Species GIDIAS, and (3) a compilation of the search strategies and datasets included in the Global Impact Dataset of Invasive Alien Species (GIDIAS).
w
COVID-19 National Longitudinal Phone Survey 2020 – World Bank LSMS...
microdata.worldbank.org
catalog.ihsn.org
Updated Oct 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Bureau of Statistics (NBS) (2021). COVID-19 National Longitudinal Phone Survey 2020 – World Bank LSMS Harmonized Dataset - Nigeria [Dataset]. https://microdata.worldbank.org/index.php/catalog/3856
Explore at:
Dataset updated
Oct 25, 2021
Dataset authored and provided by
National Bureau of Statistics (NBS)
Time period covered
2018 - 2021
Area covered
Nigeria
Description
Abstract

To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.

The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.

Two harmonized datafiles are prepared for each survey. The two datafiles are: 1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales.
2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.

Geographic coverage

National coverage

Analysis unit

Households

Individuals

Universe

The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

Kind of data

Sample survey data [ssd]

Sampling procedure

See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.

Mode of data collection

Computer Assisted Personal Interview [capi]

Cleaning operations

Nigeria General Household Survey, Panel (GHS-Panel) 2018-2019 and Nigeria COVID-19 National Longitudinal Phone Survey (COVID-19 NLPS) 2020 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).

The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.

Response rate

See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.
Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction...
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li; Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li (2023). Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction Models [Dataset]. http://doi.org/10.5281/zenodo.7909511
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7909511
Dataset updated
Nov 29, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li; Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Freebase is amongst the largest public cross-domain knowledge graphs. It possesses three main data modeling idiosyncrasies. It has a strong type system; its properties are purposefully represented in reverse pairs; and it uses mediator objects to represent multiary relationships. These design choices are important in modeling the real-world. But they also pose nontrivial challenges in research of embedding models for knowledge graph completion, especially when models are developed and evaluated agnostically of these idiosyncrasies. We make available several variants of the Freebase dataset by inclusion and exclusion of these data modeling idiosyncrasies. This is the first-ever publicly available full-scale Freebase dataset that has gone through proper preparation.

Dataset Details
The dataset consists of the four variants of Freebase dataset as well as related mapping/support files. For each variant, we made three kinds of files available:
Subject matter triples file
fb+/-CVT+/-REV One folder for each variant. In each folder there are 5 files: train.txt, valid.txt, test.txt, entity2id.txt, relation2id.txt Subject matter triples are the triples belong to subject matters domains—domains describing real-world facts.
Example of a row in train.txt, valid.txt, and test.txt:
2, 192, 0
Example of a row in entity2id.txt:
/g/112yfy2xr, 2
Example of a row in relation2id.txt:
/music/album/release_type, 192
Explaination
"/g/112yfy2xr" and "/m/02lx2r" are the MID of the subject entity and object entity, respectively. "/music/album/release_type" is the realtionship between the two entities. 2, 192, and 0 are the IDs assigned by the authors to the objects.
Type system file
freebase_endtypes: Each row maps an edge type to its required subject type and object type.
Example
92, 47178872, 90
Explanation
"92" and "90" are the type id of the subject and object which has the relationship id "47178872".
Metadata files
object_types: Each row maps the MID of a Freebase object to a type it belongs to.
Example
/g/11b41c22g, /type/object/type, /people/person
Explanation
The entity with MID "/g/11b41c22g" has a type "/people/person"
object_names: Each row maps the MID of a Freebase object to its textual label.
Example
/g/11b78qtr5m, /type/object/name, "Viroliano Tries Jazz"@en
Explanation
The entity with MID "/g/11b78qtr5m" has name "Viroliano Tries Jazz" in English.
object_ids: Each row maps the MID of a Freebase object to its user-friendly identifier.
Example
/m/05v3y9r, /type/object/id, "/music/live_album/concert"
Explanation
The entity with MID "/m/05v3y9r" can be interpreted by human as a music concert live album.
domains_id_label: Each row maps the MID of a Freebase domain to its label.
Example
/m/05v4pmy, geology, 77
Explanation
The object with MID "/m/05v4pmy" in Freebase is the domain "geology", and has id "77" in our dataset.
types_id_label: Each row maps the MID of a Freebase type to its label.
Example
/m/01xljxh, /government/political_party, 147
Explanation
The object with MID "/m/01xljxh" in Freebase is the type "/government/political_party", and has id "147" in our dataset.
entities_id_label: Each row maps the MID of a Freebase entity to its label.
Example
/g/11b78qtr5m, Viroliano Tries Jazz, 2234
Explanation
The entity with MID "/g/11b78qtr5m" in Freebase is "Viroliano Tries Jazz", and has id "2234" in our dataset.
properties_id_label: Each row maps the MID of a Freebase property to its label.
Example
/m/010h8tp2, /comedy/comedy_group/members, 47178867
Explanation
The object with MID "/m/010h8tp2" in Freebase is a property(relation/edge), it has label "/comedy/comedy_group/members" and has id "47178867" in our dataset.
uri_original2simplified and uri_simplified2original: The mapping between original URI and simplified URI and the mapping between simplified URI and original URI repectively.
Example
uri_original2simplified
"http://rdf.freebase.com/ns/type.property.unique": "/type/property/unique"
uri_simplified2original
"/type/property/unique": "http://rdf.freebase.com/ns/type.property.unique"
Explanation
The URI "http://rdf.freebase.com/ns/type.property.unique" in the original Freebase RDF dataset is simplified into "/type/property/unique" in our dataset.
The identifier "/type/property/unique" in our dataset has URI http://rdf.freebase.com/ns/type.property.unique in the original Freebase RDF dataset.
A synthetic data generation pipeline to reproducibly mirror high-resolution...
zenodo.org
csv, txt, xls
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Frantzi; Maria Frantzi (2024). A synthetic data generation pipeline to reproducibly mirror high-resolution multi-variable peptidomics and real-patient clinical data [Dataset]. http://doi.org/10.1101/2024.10.30.24316342
Explore at:
csv, xls, txtAvailable download formats
Unique identifier
https://doi.org/10.1101/2024.10.30.24316342
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Frantzi; Maria Frantzi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Generating high quality, real-world clinical and molecular datasets is challenging, costly and time intensive. Consequently, such data should be shared with the scientific community, which however carries the risk of privacy breaches. The latter limitation hinders the scientific community’s ability to freely share and access high resolution and high quality data, which are essential especially in the context of personalised medicine.

In this study, we present an algorithm based on Gaussian copulas to generate synthetic data that retain associations within high dimensional (peptidomics) datasets. For this purpose, 3,881 datasets from 10 cohorts were employed, containing clinical, demographic, molecular (> 21,500 peptide) variables, and outcome data for individuals with a kidney or a heart failure event. High dimensional copulas were developed to portray the distribution matrix between the clinical and peptidomics data in the dataset, and based on these distributions, a data matrix of 2,000 synthetic patients was developed. Synthetic data maintained the capacity to reproducibly correlate the peptidomics data with the clinical variables.

External validation was performed, using independent multi-centric datasets (n = 2,964) of individuals with chronic kidney disease (CKD, defined as eGFR < 60 mL/min/1.73m²) or those with normal kidney function (eGFR > 90 mL/min/1.73m²). Similarly, the association of the rho-values of single peptides with eGFR between the synthetic and the external validation datasets was significantly reproduced (rho = 0.569, p = 1.8e-218). Subsequent development of classifiers by using the synthetic data matrices, resulted in highly predictive values in external real-patient datasets (AUC values of 0.803 and 0.867 for HF and CKD, respectively), demonstrating robustness of the developed method in the generation of synthetic patient data. The proposed pipeline represents a solution for high-dimensional sharing while maintaining patient confidentiality.

For this study 6,967 peptidomics mass spectrometry datasets were employed and are deposited here, including:

3,881 datasets that were employed for synthetic data generation

1) File name: hf_peptides_data.csv; size: 45.56 MB; Description: 472 datasets from patients developing a heart failure event

2) File name: ckd_peptides_data.csv; size: 10.98 MB; Description: 242 datasets from patients developing a kidney event

3) File name: no_event_peptides_fdata.csv; size: 194.70 MB; Description: 3,266 datasets from patients that did not develop any event

2,964 datasets that were used as external validation datasets (chronic kidney disease group

*Study 1: PersTIgAN

4) File name: PersTIgAN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 37.7MB; Description: Patients with CKD_Study1_export 1

5) File name: PersTIgAN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 2.6 MB; Description: Patients with CKD_Study1_export 2

*Study 2: CKD_Biobay

6) File name: CKD_BioBay_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 35.7 MB; Description: Patients with CKD_Study2_export 1

7) File name: CKD_BioBay_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_2.xls; size: 26.0 MB; Description: Patients with CKD_Study2_export 2

*Study 3: DC_Ren
8) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 37.96 MB; Description: Patients with CKD_Study3_export 1

9) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_2.xls; size: 38.13 MB; Description: Patients with CKD_Study3_export 2

10) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_3.xls; size: 36.86 MB; Description: Patients with CKD_Study3_export 3

11) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_4.xls; size: 38.39 MB; Description: Patients with CKD_Study3_export 4

12) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_5.xls; size: 38.12 MB; Description: Patients with CKD_Study3_export 5

13) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_6.xls; size: 36.73 MB; Description: Patients with CKD_Study3_export 6

14) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_7.xls; size: 2.15 MB; Description: Patients with CKD_Study3_export 7

*Non-CKD

15) File name: NonCKD_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 37.72 MB; Description: datasets from patients without CKD_export 1

16) File name: NonCKD_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_2.xls; size: 38.31MB; Description: datasets from patients without CKD_export 2

17) File name: NonCKD_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_3.xls; size: 36.95 MB; Description: datasets from patients without CKD_export 3

122 datasets that were used as external validation datasets (heart failure group)

7) File name: HF_external_case_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot.xls; size: 3.13 MB; Description: datasets from patients that develop heart failure

8) File name: HF_external_Control_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot.xls; size: 3.94 MB; Description: datasets from patients that did not develop heart failure
A long-term global population proportion with access to electricity dataset...
zenodo.org
data.niaid.nih.gov
bin, csv, tiff
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luling Liu; Luling Liu; Xin Cao; Xin Cao (2025). A long-term global population proportion with access to electricity dataset (SDG 7.1.1) from 1992 to 2022 based on nighttime light remote sensing [Dataset]. http://doi.org/10.5281/zenodo.14018079
Explore at:
tiff, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14018079
Dataset updated
May 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Luling Liu; Luling Liu; Xin Cao; Xin Cao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

In 2015, the United Nations established 17 Sustainable Development Goals (SDGs), with Goal 7 focusing on ensuring access to affordable, reliable, and sustainable modern energy for all by 2030. By 2022, approximately 760 million people, or 1 in 11 globally still lacked electricity access according to Tracking SDG7 :The Energy Progress Report 2022, posing significant challenges to achieving this goal. Traditional survey methods for estimating the proportion of people with electricity access are often costly, infrequently updated, and hindered by the need for interpolation of historical data.

To address these challenges, this dataset employs a nighttime light remote sensing estimation framework that integrates DMSP-CCNL and NPP/VIIRS data with GlobPOP population data. This approach produces a global 0.1-degree grid and national-scale electricity access index (EAI) maps from 1992 to 2022.

The framework results' correlation coefficient (R) with World Bank survey data from 1992 to 2022 is 0.87, and the RMSE is 15.4, demonstrating its reliability at the national level. By effectively capturing geospatial changes, this dataset supports SDG 7.1.1 monitoring and offers valuable insights for policymakers to address electricity access disparities and promote sustainable energy transitions.

Data Description

1. This dataset consists of 0.1-degree grid Electricity Access Index (EAI) data in GeoTIFF format, where each pixel value represents the proportion of the population with access to electricity within that area.

Example Filename: EAI_0dot1_Deg_WGS84_F32_1992

Field 1: EAI (Proportion of people with access to electricity)

Field 2&3: Spatial resolution is 0.1 degree

Field 4: Coordinate system is WGS84

Field 5: Data type is F32 (Float32)

Field 6: Year "1992"

2. Aggregated EAI data at the national scale is provided in both Shapefile and CSV formats:

Table Filename: EAI_Level_0_1992_2022.csv

Fields include:

SOC (Country code)

Name (Country name)

National EAI data from 1992 to 2022

Shape Filename: EAI_Level_0_1992_2022.shp

Boundary data sourced from GADM (Database of Global Administrative Areas)

3. The pixel-level (30 arc-seconds) Electricity Accessed Population Density is provided in GeoTIFF format, as identified through nighttime light (NTL) data.

Example Filename: Elec_PopDen_WGS84_30arc_F32_1992

Field 1 & 2: Population Density with access to electricity (per km^2)

Field 3: Coordinate system is WGS84

Field 4: Spatial resolution is 30 arc-seconds

Field 5: Data type is F32 (Float32)

Field 6: Year "1992"

If you encounter any issues, please contact us via email at liu.luling.k2@s.mail.nagoya-u.ac.jp.

More Information

The source codes are publicly available at GitHub: https://github.com/lulingliu/EAI.
High-Frequency Phone Survey on COVID-19 - World Bank LSMS Harmonized Dataset...
catalog.ihsn.org
microdata.worldbank.org
Updated Jan 3, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malawi National Statistical Office (NSO) (2022). High-Frequency Phone Survey on COVID-19 - World Bank LSMS Harmonized Dataset - Malawi [Dataset]. https://catalog.ihsn.org/catalog/9901
Explore at:
Dataset updated
Jan 3, 2022
Dataset provided by
National Statistical Office of Malawihttp://www.nsomalawi.mw/
Authors
Malawi National Statistical Office (NSO)
Time period covered
2019 - 2021
Area covered
Malawi
Description
Abstract

To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.

The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.

Two harmonized datafiles are prepared for each survey. The two datafiles are: 1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales.
2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.

Geographic coverage

National coverage

Analysis unit

Households

Individuals

Universe

The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

Kind of data

Sample survey data [ssd]

Sampling procedure

See “Malawi - Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs)” and “Malawi - High-Frequency Phone Survey on COVID-19” available in the Microdata Library for details.

Mode of data collection

Computer Assisted Personal Interview [capi]

Cleaning operations

Malawi Integrated Household Panel Survey (IHPS) 2019 and Malawi High-Frequency Phone Survey on COVID-19 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).

The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.

Response rate

See “Malawi - Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs)” and “Malawi - High-Frequency Phone Survey on COVID-19” available in the Microdata Library for details.
A dataset related to the Batwa’s Right to Recognition as a Minority and...
figshare.com
xlsx
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ndikubwimana Jean Baptiste; Anangwe Kathleen A; Nyarwath Oriare; Mwimali Jack; Kabwete Charles (2023). A dataset related to the Batwa’s Right to Recognition as a Minority and Indigenous People in Rwanda [Dataset]. http://doi.org/10.6084/m9.figshare.24612147.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24612147.v1
Dataset updated
Nov 22, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ndikubwimana Jean Baptiste; Anangwe Kathleen A; Nyarwath Oriare; Mwimali Jack; Kabwete Charles
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Rwanda
Description
This codebook of data is related to the study conducted on the Batwa’s Rights to Recognition as a Minority and Indigenous People in Rwanda through the Lens of a Human Rights Based Approach. The dataset displays information in 7 columns. The first column is called Code level 1 which consists of the main code extracted from the findings, the second column is code level 2 which consists of sub-codes extracted from code level 1 and the third column is called code level 3 which is extracted from code level 2. The 4th column provides a snapshot of definition of the content of the codes. The column 5 concerns what the codes should include and the 6th column concerns what the codes should not include. The 7th column concerns the types of questions asked to respondents based on which codes were generated. These codes were generated following data extracted from questionnaire summarized in 7th column. For example, the first column (Code level 1) is made of 4 rows. The first two rows concern findings from the literature review and the last two rows concern empirical data from the fieldwork. Both data from literature review and empirical data from the fieldwork were combined to come up with findings based on which an interpretation was made. These codes allowed the researchers to give meaningful findings which in return facilitated researchers to provided a consolidated interpretation. The data generated aligned to epistemological interpretivism and they concern views from respondents on socio-cultural narratives and emotional experiences that the they have endured in their lives. The data collection was conducted in three rural districts of Nyaruguru (southern province), Rubavu and Rutsiro (western Province) and in three urban districts of Nyarugenge, Kicukiro and Gasabo (Kigali City). The justification for the three rural and three urban districts was to find out if there were divergent socio-cultural realities within each and across the diverse settings. The selected rural sites were those near protected areas from where the Batwa were the subjects of eviction following the legislation of protected areas in 1930 by colonial authorities. The urban districts were the sites in which some Batwa had lived after the imposition of a new lifestyle which differs from their hunting and gathering tradition following their eviction from forests. The study sites were purposively selected through the facilitation of gatekeepers namely, local entities. Authorization was sent to the district level which subsequently allowed a team of researchers to approach the sector, the cell and the village levels of administration. At the village level, which is the lowest entity where households of HMP live, respondents were again identified through the help of the Chief of the Village (umudugudu) who served as a gatekeeper.Focus Group Discussions (FGDs) along with direct observation were administered to the members of HMP (formerly referred to as Batwa). The groups comprised individuals who were above the age of 18 years, and were deemed to have experienced hardship as result of socio-economic vulnerability resulting from forest eviction. In-depth interviews were also carried out with officials of selected public institutions, including officials from the National Commission of Unity and Reconciliation and the National Commission of Human Rights. Key informants’ interviews (KIIs) were administered to leaders from NGOs and cooperative societies working towards the promotion of the rights of HMPs. These included one top manager and another who used to among the top managers of Cooperative des Potiers au Rwanda (COPORWA), a local NGO advocating for the rights of Batwa in Rwanda as well as one person who used to be among the leaders of CAURWA (Communauté des Autochtones au Rwanda, translated as Community of Autochthonies in Rwanda). The latter was also among one of the founding pioneers of a local NGO advocating for the rights of the Batwa in Rwanda. A former representative of HMP in Rwanda’s Senate was also contacted for an in-depth interview.All respondents were purposively selected due to their expertise or lived experience on the subject of self-identity and non-discrimination. Key informants from COPORWA, and a representative of the HMP in the Rwandan Senate and authorities from the government were to provide information on convergences or divergences on the phenomenon under investigation.In total, 226 respondents divided into four categories were approached for feedback. These were 220 heads of households from HMP for FGDs and direct observation; 3 leaders from COPORWA for in-depth interviews; 1 ex-Senator representing HMP in Rwanda Senate for an in-depth interview including 2 authorities from governmental institutions. The aim of using different tools for different respondents was to not only get a wide range of perceptions on the subject matter of self-identity and non-discrimination under investigation, but to enable the triangulation of information. FGDs along with direct observation facilitated the exploration of opinions and observation of behaviour and body language of the respondents when a sensitive issue, such as discrimination, was mentioned. As ethical consideration, all respondents were requested for their consent prior to data collection. All interviews were guided by the principle of ‘theoretical saturation’, which consists of administering inquiry until respondents start to repeat themselvesTo meet the reliability and validity of data, some measures were taken. Meetings were held every morning to plan for the day and every evening to evaluate the day spent in the field. For each day of data collection, the data collectors gave a daily report highlighting the progress made and any special information relating to the subject matter under investigation, which was observed from the field. The study used thematic analysis embedded in a deductive approach guided by the human rights-based approach in which two variables of self-identity and non-discrimination were the focus of study. The human rights-based approach facilitated generating data around themes related to self-identity and non-discrimination.In short, findings around the Batwa’s rights to self-identity and to non-discrimination indicated different information over the two variables. On the self-identity, findings indicated that the identity of the Batwa has been shifting because of socio-cultural dynamics affecting the contexts in which they find themselves and live. For example, the name “HMP” which conflate all vulnerable groups in Rwanda provides divergent views for respondents. For ordinary respondents from the Batwa, the name provides a negative profile while for the elites from Batwa the name means obscuring their problems since it disconnects from other indigenous people across Africa and the World. For respondents from the GoR, the name means upholding unit and reconciliation. Findings from the data indicated also that the identity Batwa has been characterised with negative profile of someone who is the poorest, dirty, indigent because of their lowest social status resulting from non-dominant context. This reality corroborates other recent studies that the identity of the Batwa does not have a fixed boundary.On the variable of non-discrimination, findings from the data indicated that negative profiles mentioned above are forms of indirect discrimination resulting from microaggressions and stereotypes. For further information how to use the dataset kindly contact the correspondent author at: ndikubwimana.genbattista@gmail.com, tel: (+250)788 751 225
H
Replication Data for: What’s in a Name? Towards the Study of Names in...
dataverse.harvard.edu
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Givens (2025). Replication Data for: What’s in a Name? Towards the Study of Names in Political Science [Dataset]. http://doi.org/10.7910/DVN/REJ38B
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/REJ38B
Dataset updated
Apr 25, 2025
Dataset provided by
Harvard Dataverse
Authors
John Givens
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
It is overdue for political science to consider the names of nation-states, the discipline’s primary unit of analysis and the world’s largest, richest, and most powerful institutions. This research note begins such analysis by examining the descriptors used in formal country names including Empire, Kingdom, Islamic, Republic, Democratic, Socialist, and People’s. I analyze country names as independent variables, hypothesizing that they have value as signals of political characteristics. To test my hypotheses, I turn to the Varieties of Democracy dataset. I use fixed effects panel regressions to examine if countries’ descriptors correlate with the characteristics they name. I find that except for the democratic descriptor all others are surprisingly accurate. This is the first step towards developing an understanding of names in political science while adding a new tool for comparative politics.
GBIF Backbone Taxonomy
smng.net
data.zse.pensoft.net
+6more
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GBIF Secretariat (2023). GBIF Backbone Taxonomy [Dataset]. http://doi.org/10.15468/39omei
Explore at:
Unique identifier
https://doi.org/10.15468/39omei
Dataset updated
Nov 17, 2023
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.

It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.

International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.

UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.

The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.

The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:
Catalogue of Life Checklist - 4766428 names
International Barcode of Life project (iBOL) Barcode Index Numbers (BINs) - 635951 names
UNITE - Unified system for the DNA based fungal species linked to the classification - 611208 names
The Paleobiology Database - 212054 names
World Register of Marine Species - 188857 names
The Interim Register of Marine and Nonmarine Genera - 183894 names
The World Checklist of Vascular Plants (WCVP) - 131891 names
GBIF Backbone Taxonomy - 114350 names
TAXREF - 109374 names
The Leipzig catalogue of vascular plants - 75380 names
ZooBank - 73549 names
Integrated Taxonomic Information System (ITIS) - 68377 names
Plazi.org taxonomic treatments database - 61346 names
Genome Taxonomy Database r207 - 60545 names
International Plant Names Index - 52329 names
Fauna Europaea - 45077 names
The National Checklist of Taiwan (Catalogue of Life in Taiwan, TaiCoL) - 36193 names
Dyntaxa. Svensk taxonomisk databas - 35892 names
The Plant List with literature - 32692 names
United Kingdom Species Inventory (UKSI) - 29643 names
Artsnavnebasen - 29208 names
The IUCN Red List of Threatened Species - 21221 names
Afromoths, online database of Afrotropical moth species (Lepidoptera) - 13961 names
Brazilian Flora 2020 project - Projeto Flora do Brasil 2020 - 13829 names
Prokaryotic Nomenclature Up-to-Date (PNU) - 10079 names
Checklist Dutch Species Register - Nederlands Soortenregister - 8814 names
ICTV Master Species List (MSL) - 7852 names
Cockroach Species File - 6020 names
GRIN Taxonomy - 5882 names
Taxon list of fungi and fungal-like organisms from Germany compiled by the DGfM - 4570 names
Catalogue of Afrotropical Bees - 3623 names
Catalogue of Tenebrionidae (Coleoptera) of North America - 3327 names
Checklist of Beetles (Coleoptera) of Canada and Alaska. Second Edition. - 3312 names
Systema Dipterorum - 2850 names
Catalogue of the Pterophoroidea of the World - 2807 names
The Clements Checklist - 2675 names
Taxon list of Hymenoptera from Germany compiled in the context of the GBOL project - 2496 names
IOC World Bird List, v13.2 - 2366 names
Official Lists and Indexes of Names in Zoology - 2310 names
National checklist of all species occurring in Denmark - 1922 names
Myriatrix - 1876 names
Database of Vascular Plants of Canada (VASCAN) - 1822 names
Taxon list of vascular plants from Bavaria, Germany compiled in the context of the BFL project - 1771 names
Orthoptera Species File - 1742 names
A list of the terrestrial fungi, flora and fauna of Madeira and Selvagens archipelagos - 1602 names
Aphid Species File - 1565 names
World Spider Catalog - 1561 names
Taxon list of Jurassic Pisces of the Tethys Palaeo-Environment compiled at the SNSB-JME - 1270 names
Backbone Family Classification Patch - 1143 names
GBIF Algae Classification - 1100 names
International Cichorieae Network (ICN): Cichorieae Portal - 975 names
Psocodea Species File - 803 names
New Zealand Marine Macroalgae Species Checklist - 787 names
Annotated checklist of endemic species from the Western Balkans - 754 names
Taxon list of animals with German names (worldwide) compiled at the SMNS - 503 names
Catalogue of the Alucitoidea of the World - 472 names
Lygaeoidea Species File - 462 names
Catálogo de Plantas y Líquenes de Colombia - 422 names
GBIF Backbone Patch - 317 names
Phasmida Species File - 259 names
Cortinariaceae fetched from the Index Fungorum API - 234 names
Coreoidea Species File - 233 names
GTDB supplement - 139 names
Mantodea Species File - 119 names
Endemic species in Taiwan - 93 names
Taxon list of Araneae from Germany compiled in the context of the GBOL project - 88 names
Species of Hominidae - 78 names
Taxon list of Sternorrhyncha from Germany compiled in the context of the GBOL project - 77 names
Taxon list of mosses from Germany compiled in the context of the GBOL project - 75 names
Mammal Species of the World - 73 names
Plecoptera Species File - 71 names
Species Fungorum Plus - 64 names
Catalogue of the type specimens of Cosmopterigidae (Lepidoptera: Gelechioidea) from research collections of the Zoological Institute, Russian Academy of Sciences - 47 names
Species named after famous people - 41 names
Dermaptera Species File - 36 names
Taxon list of Trichoptera from Germany compiled in the context of the GBOL project - 34 names
True Fruit Flies (Diptera, Tephritidae) of the Afrotropical Region - 33 names
Range and Regularities in the Distribution of Earthworms of the Earthworms of the USSR Fauna. Perel, 1979 - 32 names
Taxon list of Diplura from Germany compiled in the context of the GBOL project - 30 names
Lista de referencia de especies de aves de Colombia - 2022 - 24 names
Taxon list of Auchenorrhyncha from Germany compiled in the context of the GBOL project - 20 names
Catalogue of the type specimens of Polycestinae (Coleoptera: Buprestidae) from research collections of the Zoological Institute, Russian Academy of Sciences - 19 names
Taxon list of Thysanoptera from Germany compiled in the context of the GBOL project - 19 names
Lista de especies de vertebrados registrados en jurisdicción del Departamento del Huila - 18 names
Taxon list of Microcoryphia (Archaeognatha) from Germany compiled in the context of the GBOL project - 15 names
Catalogue of the type specimens of Bufonidae and Megophryidae (Amphibia: Anura) from research collections of the Zoological Institute, Russian Academy of Sciences - 12 names
Grylloblattodea Species File - 11 names
Coleorrhyncha Species File - 9 names
Taxon list of liverworts from Germany compiled in the context of the GBOL project - 9 names
Embioptera Species File - 7 names
Taxon list of Pisces and Cyclostoma from Germany compiled in the context of the GBOL project - 6 names
Taxon list of Pteridophyta from Germany compiled in the context of the GBOL project - 6 names
Taxon list of Siphonaptera from Germany compiled in the context of the GBOL project - 5 names
The Earthworms of the Fauna of Russia. Perel, 1997 - 5 names
Taxon list of Zygentoma from Germany compiled in the context of the GBOL project - 4 names
Asiloid Flies: new taxa of Diptera: Apioceridae, Asilidae, and Mydidae - 3 names
Taxon list of Protura from Germany compiled in the context of the GBOL project - 3 names
Taxon list of hornworts from Germany compiled in the context of the GBOL project - 2 names
Chrysididae Species File - 1 names
Taxon list of Dermaptera from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Diplopoda from Germany in the context of the GBOL project - 1 names
Taxon list of Orthoptera (Grashoppers) from Germany compiled at the SNSB - 1 names
Taxon list of Pscoptera from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Pseudoscorpiones from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Raphidioptera from Germany compiled in the context of the GBOL project - 1 names

glenglat: Global englacial temperature database

zenodo.org

zip

Updated Aug 19, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mylène Jacquemart; Mylène Jacquemart; Ethan Welty; Ethan Welty (2024). glenglat: Global englacial temperature database [Dataset]. http://doi.org/10.5281/zenodo.13334175

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13334175

Dataset updated

Aug 19, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Mylène Jacquemart; Mylène Jacquemart; Ethan Welty; Ethan Welty

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jul 20, 1842 - Jul 27, 2023

Description

Open-access database of englacial temperature measurements compiled from data submissions and published literature. It is developed on GitHub and published to Zenodo.

Data structure

The dataset adheres to the Frictionless Data Tabular Data Package specification. The metadata in datapackage.json describes, in detail, the contents of the tabular data files in the data folder:

source.csv: Description of each data source (either a personal communication or the reference to a published study).
borehole.csv: Description of each borehole (location, elevation, etc), linked to source.csv via source_id and less formally via source identifiers in notes.
profile.csv: Description of each profile (date, etc), linked to borehole.csv via borehole_id and to source.csv via source_id and less formally via source identifiers in notes.
measurement.csv: Description of each measurement (depth and temperature), linked to profile.csv via borehole_id and profile_id.

For boreholes with many profiles (e.g. from automated loggers), pairs of profile.csv and measurement.csv are stored separately in subfolders of data named {source.id}-{glacier}, where glacier is a simplified and kebab-cased version of the glacier name (e.g. flowers2022-little-kluane).

`data/source.csv`

Sources of information considered in the compilation of this database. Column names and categorical values closely follow the Citation Style Language (CSL) 1.0.2 specification. Names of people in non-Latin scripts are followed by a latinization in square brackets (e.g. В. С. Загороднов [V. S. Zagorodnov]) and non-English titles are followed by a translation in square brackets.

name	type	description
`id` (required)	string	Unique identifier constructed from the first author's lowercase, latinized, family name and the publication year, followed as needed by a lowercase letter to ensure uniqueness (e.g. Загороднов 1981 → zagorodnov1981a).
`author` (required)	string	Author names (optionally followed by their ORCID in parentheses) as a pipe-delimited list.
`year` (required)	year	Year of publication.
`type` (required)	string	Item type. - article-journal: Journal article - book: Book (if the entire book is relevant) - chapter: Book section - document: Document not fitting into any other category - dataset: Collection of data - map: Geographic map - paper-conference: Paper published in conference proceedings - personal-communication: Personal communication between individuals - speech: Presentation (talk, poster) at a conference - report: Report distributed by an institution - thesis: Thesis written to satisfy degree requirements - webpage: Website or page on a website
`title`	string	Item title.
`url`	string	URL (DOI if available).
`language` (required)	string	Language as ISO 639-1 two-letter language code. - de: German - en: English - fr: French - ko: Korean - ru: Russian - sv: Swedish - zh: Chinese
`container_title`	string	Title of the container (e.g. journal, book).
`volume`	integer	Volume number of the item or container.
`issue`	string	Issue number (e.g. 1) or range (e.g. 1-2) of the item or container, with an optional letter prefix (e.g. F1).
`page`	string	Page number (e.g. 1) or range (e.g. 1-2) of the item in the container.
`version`	string	Version number (e.g. 1.0) of the item.
`editor`	string	Editor names (e.g. of the containing book) as a pipe-delimited list.
`collection_title`	string	Title of the collection (e.g. book series).
`collection_number`	string	Number (e.g. 1) or range (e.g. 1-2) in the collection (e.g. book series volume).
`publisher`	string	Publisher name.

`data/borehole.csv`

Metadata about each borehole.

name	type	description
`id` (required)	integer	Unique identifier.
`source_id` (required)	string	Identifier of the source of the earliest temperature measurements. This is also the source of the borehole attributes unless otherwise stated in `notes`.
`glacier_name` (required)	string	Glacier or ice cap name (as reported).
`glims_id`	string	Global Land Ice Measurements from Space (GLIMS) glacier identifier.
`location_origin` (required)	string	Origin of location (`latitude`, `longitude`). - submitted: Provided in data submission - published: Reported as coordinates in original publication - digitized: Digitized from published map with complete axes - estimated: Estimated from published plot by comparing to a map (e.g. Google Maps, CalTopo) - guessed: Estimated with difficulty, for example by comparing `elevation` to a map (e.g. Google Maps, CalTopo)
`latitude` (required)	number [degree]	Latitude (EPSG 4326).
`longitude` (required)	number [degree]	Longitude (EPSG 4326).
`elevation_origin` (required)	string	Origin of elevation (`elevation`). - submitted: Provided in data submission - published: Reported as number in original publication - digitized: Digitized from published plot with complete axes - estimated: Estimated from elevation contours in published map - guessed: Estimated with difficulty, for example by comparing location (`latitude`, `longitude`) to a map of contemporary elevations (e.g. CalTopo, Google Maps)
`elevation` (required)	number [m]	Elevation above sea level.
`label`	string	Borehole name (e.g. as labeled on a plot).
`date_min`	date (%Y-%m-%d)	Begin date of drilling, or if not known precisely, the first possible date (e.g. 2019 → 2019-01-01).
`date_max`	date (%Y-%m-%d)	End date of drilling, or if not known precisely, the last possible date (e.g. 2019 → 2019-12-31).
`drill_method`	string	Drilling method. - mechanical: Push, percussion, rotary - thermal: Hot point, electrothermal, steam - combined: Mechanical and thermal
`ice_depth`	number [m]	Starting depth of ice. Infinity (INF) indicates that ice was not reached.
`depth`	number [m]	Total borehole depth (not including drilling in the underlying bed).
`to_bed`	boolean	Whether the borehole reached the glacier bed.
`temperature_accuracy`	number [°C]	Thermistor accuracy or precision (as reported). Typically understood to represent one standard deviation.
`notes`	string	Additional remarks about the study site, the borehole, or the measurements therein. Souces are referenced by their `id`.
`curator`	string	Names of people who added the data to the database, as a pipe-delimited list.

`data/profile.csv`

Date and time of each measurement profile.

name	type	description
`borehole_id` (required)	integer	Borehole identifier.
`id` (required)	integer	Borehole profile identifier (starting from 1 for each borehole).
`source_id` (required)	string	Source identifier.
`measurement_origin` (required)	string	Origin of measurements (`measurement.depth`, `measurement.temperature`). - submitted: Provided as numbers in data submission - published: Numbers read from original publication - digitized: Digitized from published plot(s) with Plot Digitizer
`date_min`	date (%Y-%m-%d)	Measurement date, or if not known precisely, the first possible date (e.g. 2019 → 2019-01-01).
`date_max`

g
CARMA, Finland Power Plant Emissions, Finland, 2000/2007/Future
geocommons.com
Updated May 6, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CARMA (2008). CARMA, Finland Power Plant Emissions, Finland, 2000/2007/Future [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
May 6, 2008
Dataset provided by
CARMA
data
Description
All the data for this dataset is provided from CARMA: Data from CARMA (www.carma.org) This dataset provides information about Power Plant emissions in Finland. Power Plant emissions from all power plants in Finland were obtained by CARMA for the past (2000 Annual Report), the present (2007 data), and the future. CARMA determine data presented for the future to reflect planned plant construction, expansion, and retirement. The dataset provides the name, company, parent company, city, state, zip, county, metro area, lat/lon, and plant id for each individual power plant. The dataset reports for the three time periods: Intensity: Pounds of CO2 emitted per megawatt-hour of electricity produced. Energy: Annual megawatt-hours of electricity produced. Carbon: Annual carbon dioxide (CO2) emissions. The units are short or U.S. tons. Multiply by 0.907 to get metric tons. Carbon Monitoring for Action (CARMA) is a massive database containing information on the carbon emissions of over 50,000 power plants and 4,000 power companies worldwide. Power generation accounts for 40% of all carbon emissions in the United States and about one-quarter of global emissions. CARMA is the first global inventory of a major, sector of the economy. The objective of CARMA.org is to equip individuals with the information they need to forge a cleaner, low-carbon future. By providing complete information for both clean and dirty power producers, CARMA hopes to influence the opinions and decisions of consumers, investors, shareholders, managers, workers, activists, and policymakers. CARMA builds on experience with public information disclosure techniques that have proven successful in reducing traditional pollutants. Please see carma.org for more information

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). Baby Names by Year [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-baby-names-by-year-of-birth/discussion

Baby Names by Year

Baby names by year of birth

Explore at:

14 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 20, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

About this dataset

This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time

How to use the dataset

How to use the US Baby Names by Year of Birth dataset:

This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.

This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years

Research Ideas

This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years

Columns

index: the index of the dataframe
YearOfBirth: the year in which the baby was born
Name: the name of the baby
Sex: the sex of the baby
Number: the number of babies with that name and sex

Acknowledgements

If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov

Data Source

Clear search

Close search

Google apps

Main menu

Baby Names by Year

About this dataset

How to use the dataset

How to use the US Baby Names by Year of Birth dataset:

Research Ideas

Columns

Acknowledgements

Gender by Name (Time-series)

Automated Gender Identification Using Name Probabilities

2019 US Social Security Administration Data

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Global Country Information 2023

COVID-19 High Frequency Phone Survey of Households 2020 - World Bank LSMS...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Response rate

Johns Hopkins COVID-19 Case Tracker

Updates

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

Queries

Interactive

Interactive Embed Code

Caveats

Attribution

Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...

World cities database

CompanyData.com (BoldData) - Company Dataset of 6M IT companies worldwide

ReCANVo: A Dataset of Real-World Communicative and Affective Nonverbal...

Data from: Global Impacts Dataset of Invasive Alien Species (GIDIAS)

COVID-19 National Longitudinal Phone Survey 2020 – World Bank LSMS...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Response rate

Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction...

A synthetic data generation pipeline to reproducibly mirror high-resolution...

A long-term global population proportion with access to electricity dataset...

Introduction

Data Description

More Information

High-Frequency Phone Survey on COVID-19 - World Bank LSMS Harmonized Dataset...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Response rate

A dataset related to the Batwa’s Right to Recognition as a Minority and...

Replication Data for: What’s in a Name? Towards the Study of Names in...

GBIF Backbone Taxonomy

glenglat: Global englacial temperature database

Data structure

data/source.csv

data/borehole.csv

data/profile.csv

CARMA, Finland Power Plant Emissions, Finland, 2000/2007/Future

Baby Names by Year

`data/source.csv`

`data/borehole.csv`

`data/profile.csv`