https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time
This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.
This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years
This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years
If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov
By Derek Howard [source]
This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.
To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).
In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
Good luck!
Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.
Generate gender neutral names - use this data to generate random names with no gender bias.
Automate record lookup - quickly and accurately assign genders based on the probability associated with their name
If you use this dataset in your research, please credit the original authors.
License
Unknown License - Please check the dataset description for more information.
File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.
Key Features
Country: Name of the country.
Density (P/Km2): Population density measured in persons per square kilometer.
Abbreviation: Abbreviation or code representing the country.
Agricultural Land (%): Percentage of land area used for agricultural purposes.
Land Area (Km2): Total land area of the country in square kilometers.
Armed Forces Size: Size of the armed forces in the country.
Birth Rate: Number of births per 1,000 population per year.
Calling Code: International calling code for the country.
Capital/Major City: Name of the capital or major city.
CO2 Emissions: Carbon dioxide emissions in tons.
CPI: Consumer Price Index, a measure of inflation and purchasing power.
CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
Currency_Code: Currency code used in the country.
Fertility Rate: Average number of children born to a woman during her lifetime.
Forested Area (%): Percentage of land area covered by forests.
Gasoline_Price: Price of gasoline per liter in local currency.
GDP: Gross Domestic Product, the total value of goods and services produced in the country.
Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
Largest City: Name of the country's largest city.
Life Expectancy: Average number of years a newborn is expected to live.
Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
Minimum Wage: Minimum wage level in local currency.
Official Language: Official language(s) spoken in the country.
Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
Physicians per Thousand: Number of physicians per thousand people.
Population: Total population of the country.
Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
Tax Revenue (%): Tax revenue as a percentage of GDP.
Total Tax Rate: Overall tax burden as a percentage of commercial profits.
Unemployment Rate: Percentage of the labor force that is unemployed.
Urban Population: Percentage of the population living in urban areas.
Latitude: Latitude coordinate of the country's location.
Longitude: Longitude coordinate of the country's location.
Potential Use Cases
Analyze population density and land area to study spatial distribution patterns.
Investigate the relationship between agricultural land and food security.
Examine carbon dioxide emissions and their impact on climate change.
Explore correlations between economic indicators such as GDP and various socio-economic factors.
Investigate educational enrollment rates and their implications for human capital development.
Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
Study labor market dynamics through indicators such as labor force participation and unemployment rates.
Investigate the role of taxation and its impact on economic development.
Explore urbanization trends and their social and environmental consequences.
To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.
The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.
Two harmonized datafiles are prepared for each survey. The two datafiles are: 1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales. 2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.
National coverage
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
See “Ethiopia - Socioeconomic Survey 2018-2019” and “Ethiopia - COVID-19 High Frequency Phone Survey of Households 2020” available in the Microdata Library for details.
Computer Assisted Personal Interview [capi]
Ethiopia Socioeconomic Survey (ESS) 2018-2019 and Ethiopia COVID-19 High Frequency Phone Survey of Households (HFPS) 2020 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).
The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.
See “Ethiopia - Socioeconomic Survey 2018-2019” and “Ethiopia - COVID-19 High Frequency Phone Survey of Households 2020” available in the Microdata Library for details.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.
Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.
API Features:
Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.
Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.
Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.
Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.
Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.
Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data is from:
https://simplemaps.com/data/world-cities
We're proud to offer a simple, accurate and up-to-date database of the world's cities and towns. We've built it from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA.
Our database is:
At CompanyData.com (BoldData), we provide verified company data sourced directly from official trade registers. Our global IT company dataset gives you access to 6 million IT businesses worldwide, including software firms, tech consultancies, system integrators, SaaS providers, and other IT service companies. Every record is sourced from authoritative local registries, ensuring unmatched accuracy, coverage, and compliance.
This dataset is built for professionals who need reliable, structured insights into the global technology sector. Each company profile includes firmographic details such as legal entity name, registration number, business structure, size, revenue range, and industry classification (NACE/SIC). In addition, you'll find direct contact information for decision-makers—emails, mobile numbers, job titles, and department roles—helping you connect with the right people instantly.
Whether you're validating suppliers for compliance, identifying high-potential leads for sales, enriching your CRM data, or building AI models with clean and segmented business intelligence, our IT dataset is designed to support a wide range of critical use cases. From global enterprises to fast-scaling startups, our data empowers businesses to move faster and smarter.
We offer multiple delivery methods tailored to your needs. Choose from custom bulk files, access data through our self-service platform, integrate it directly into your systems via real-time API, or let us enrich your existing database with missing fields and decision-maker insights.
With a database spanning 380 million companies globally, deep IT sector segmentation, and proven expertise in sourcing from local trade registers, CompanyData.com (BoldData) helps your team identify opportunities, ensure compliance, and scale efficiently—wherever your growth takes you.
A dataset of 7077 labeled vocalizations made by non-speaking individuals. Each vocalization lasts approximately 0.5-4 seconds and is labeled with its affective or communicative meaning. Data were acquired in real-world settings (homes, schools, etc.) and were labeled in real-time by parents or caregivers who knew the non-speaking communicator well.
dataset_file_directory.csv provides the name of each vocalization file, the corresponding participant ID, and the vocalization meaning or label (delighted, frustrated, request, etc.).
If you use this dataset, please cite Johnson & Narain et al., "ReCANVo: A Database of Real-World Communicative and Affective Nonverbal Vocalizations". The authors are Jaya Narain, Kristina T. Johnson, Thomas Quatieri, Pattie Maes, and Rosalind Picard. This paper provides more information about the dataset, including data acquisition methodology, pre-processing procedures, and participant demographics.
**J.N. and K.T.J. are joint first authors on this project. Please include both names in attribution when possible (e.g., Johnson & Narain et al.).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present the Global Impacts Dataset of Invasive Alien Species (GIDIAS), a global dataset of 22865 records including impacts of invasive alien species on nature, nature’s contributions to people, and good quality of life. Records include positive and negative impacts, neutral impacts (studies were carried out, but no impacts were documented), non-directional impacts (i.e., change without detriments or benefits for native species or people), and finally, some records of alien species where no studies were found that assessed their impacts (indicating data gaps). Records cover 3353 invasive alien species from all major taxa (plants, vertebrates, invertebrates, microorganisms) and all continents and realms (terrestrial, freshwater, marine). The data were compiled to serve as robust evidence for chapter 4 “Impacts of invasive alien species on nature, nature's contributions to people, and good quality of life” of the global assessment report on invasive alien species by the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES; available on Zenodo at https://doi.org/10.5281/zenodo.7430731). The dataset is provided in a machine-readable CSV file (file name GIDIAS_20250417_machine_read.csv), with special language characters retained where used (UTF-8 format). The dataset is also provided in Excel format (file name GIDIAS_20250417_Excel.xlsx). Metadata is provided in Excel format, including descriptors for each variable (file name GIDIAS_metadata_20250417.xlsx). Additional explanations for GIDIAS is stored in Microsoft Word format (docx) and contains (1) a short description of the principles of Environmental and Socio-Economic Impact Classification for Alien Taxa (EICAT, SEICAT), (2) a description of the variables included in the Global Impacts Dataset of Invasive Alien Species GIDIAS, and (3) a compilation of the search strategies and datasets included in the Global Impact Dataset of Invasive Alien Species (GIDIAS).
To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.
The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.
Two harmonized datafiles are prepared for each survey. The two datafiles are:
1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales.
2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.
National coverage
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.
Computer Assisted Personal Interview [capi]
Nigeria General Household Survey, Panel (GHS-Panel) 2018-2019 and Nigeria COVID-19 National Longitudinal Phone Survey (COVID-19 NLPS) 2020 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).
The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.
See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Freebase is amongst the largest public cross-domain knowledge graphs. It possesses three main data modeling idiosyncrasies. It has a strong type system; its properties are purposefully represented in reverse pairs; and it uses mediator objects to represent multiary relationships. These design choices are important in modeling the real-world. But they also pose nontrivial challenges in research of embedding models for knowledge graph completion, especially when models are developed and evaluated agnostically of these idiosyncrasies. We make available several variants of the Freebase dataset by inclusion and exclusion of these data modeling idiosyncrasies. This is the first-ever publicly available full-scale Freebase dataset that has gone through proper preparation.
Dataset Details
The dataset consists of the four variants of Freebase dataset as well as related mapping/support files. For each variant, we made three kinds of files available:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generating high quality, real-world clinical and molecular datasets is challenging, costly and time intensive. Consequently, such data should be shared with the scientific community, which however carries the risk of privacy breaches. The latter limitation hinders the scientific community’s ability to freely share and access high resolution and high quality data, which are essential especially in the context of personalised medicine.
In this study, we present an algorithm based on Gaussian copulas to generate synthetic data that retain associations within high dimensional (peptidomics) datasets. For this purpose, 3,881 datasets from 10 cohorts were employed, containing clinical, demographic, molecular (> 21,500 peptide) variables, and outcome data for individuals with a kidney or a heart failure event. High dimensional copulas were developed to portray the distribution matrix between the clinical and peptidomics data in the dataset, and based on these distributions, a data matrix of 2,000 synthetic patients was developed. Synthetic data maintained the capacity to reproducibly correlate the peptidomics data with the clinical variables.
External validation was performed, using independent multi-centric datasets (n = 2,964) of individuals with chronic kidney disease (CKD, defined as eGFR < 60 mL/min/1.73m²) or those with normal kidney function (eGFR > 90 mL/min/1.73m²). Similarly, the association of the rho-values of single peptides with eGFR between the synthetic and the external validation datasets was significantly reproduced (rho = 0.569, p = 1.8e-218). Subsequent development of classifiers by using the synthetic data matrices, resulted in highly predictive values in external real-patient datasets (AUC values of 0.803 and 0.867 for HF and CKD, respectively), demonstrating robustness of the developed method in the generation of synthetic patient data. The proposed pipeline represents a solution for high-dimensional sharing while maintaining patient confidentiality.
For this study 6,967 peptidomics mass spectrometry datasets were employed and are deposited here, including:
1) File name: hf_peptides_data.csv; size: 45.56 MB; Description: 472 datasets from patients developing a heart failure event
2) File name: ckd_peptides_data.csv; size: 10.98 MB; Description: 242 datasets from patients developing a kidney event
3) File name: no_event_peptides_fdata.csv; size: 194.70 MB; Description: 3,266 datasets from patients that did not develop any event
*Study 1: PersTIgAN
4) File name: PersTIgAN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 37.7MB; Description: Patients with CKD_Study1_export 1
5) File name: PersTIgAN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 2.6 MB; Description: Patients with CKD_Study1_export 2
*Study 2: CKD_Biobay
6) File name: CKD_BioBay_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 35.7 MB; Description: Patients with CKD_Study2_export 1
7) File name: CKD_BioBay_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_2.xls; size: 26.0 MB; Description: Patients with CKD_Study2_export 2
*Study 3: DC_Ren
8) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 37.96 MB; Description: Patients with CKD_Study3_export 1
9) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_2.xls; size: 38.13 MB; Description: Patients with CKD_Study3_export 2
10) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_3.xls; size: 36.86 MB; Description: Patients with CKD_Study3_export 3
11) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_4.xls; size: 38.39 MB; Description: Patients with CKD_Study3_export 4
12) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_5.xls; size: 38.12 MB; Description: Patients with CKD_Study3_export 5
13) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_6.xls; size: 36.73 MB; Description: Patients with CKD_Study3_export 6
14) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_7.xls; size: 2.15 MB; Description: Patients with CKD_Study3_export 7
*Non-CKD
15) File name: NonCKD_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 37.72 MB; Description: datasets from patients without CKD_export 1
16) File name: NonCKD_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_2.xls; size: 38.31MB; Description: datasets from patients without CKD_export 2
17) File name: NonCKD_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_3.xls; size: 36.95 MB; Description: datasets from patients without CKD_export 3
7) File name: HF_external_case_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot.xls; size: 3.13 MB; Description: datasets from patients that develop heart failure
8) File name: HF_external_Control_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot.xls; size: 3.94 MB; Description: datasets from patients that did not develop heart failure
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In 2015, the United Nations established 17 Sustainable Development Goals (SDGs), with Goal 7 focusing on ensuring access to affordable, reliable, and sustainable modern energy for all by 2030. By 2022, approximately 760 million people, or 1 in 11 globally still lacked electricity access according to Tracking SDG7 :The Energy Progress Report 2022, posing significant challenges to achieving this goal. Traditional survey methods for estimating the proportion of people with electricity access are often costly, infrequently updated, and hindered by the need for interpolation of historical data.
To address these challenges, this dataset employs a nighttime light remote sensing estimation framework that integrates DMSP-CCNL and NPP/VIIRS data with GlobPOP population data. This approach produces a global 0.1-degree grid and national-scale electricity access index (EAI) maps from 1992 to 2022.
The framework results' correlation coefficient (R) with World Bank survey data from 1992 to 2022 is 0.87, and the RMSE is 15.4, demonstrating its reliability at the national level. By effectively capturing geospatial changes, this dataset supports SDG 7.1.1 monitoring and offers valuable insights for policymakers to address electricity access disparities and promote sustainable energy transitions.
1. This dataset consists of 0.1-degree grid Electricity Access Index (EAI) data in GeoTIFF format, where each pixel value represents the proportion of the population with access to electricity within that area.
Example Filename: EAI_0dot1_Deg_WGS84_F32_1992
2. Aggregated EAI data at the national scale is provided in both Shapefile and CSV formats:
Fields include:
3. The pixel-level (30 arc-seconds) Electricity Accessed Population Density is provided in GeoTIFF format, as identified through nighttime light (NTL) data.
Example Filename: Elec_PopDen_WGS84_30arc_F32_1992
If you encounter any issues, please contact us via email at liu.luling.k2@s.mail.nagoya-u.ac.jp.
The source codes are publicly available at GitHub: https://github.com/lulingliu/EAI.
To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.
The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.
Two harmonized datafiles are prepared for each survey. The two datafiles are:
1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales.
2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.
National coverage
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
See “Malawi - Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs)” and “Malawi - High-Frequency Phone Survey on COVID-19” available in the Microdata Library for details.
Computer Assisted Personal Interview [capi]
Malawi Integrated Household Panel Survey (IHPS) 2019 and Malawi High-Frequency Phone Survey on COVID-19 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).
The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.
See “Malawi - Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs)” and “Malawi - High-Frequency Phone Survey on COVID-19” available in the Microdata Library for details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This codebook of data is related to the study conducted on the Batwa’s Rights to Recognition as a Minority and Indigenous People in Rwanda through the Lens of a Human Rights Based Approach. The dataset displays information in 7 columns. The first column is called Code level 1 which consists of the main code extracted from the findings, the second column is code level 2 which consists of sub-codes extracted from code level 1 and the third column is called code level 3 which is extracted from code level 2. The 4th column provides a snapshot of definition of the content of the codes. The column 5 concerns what the codes should include and the 6th column concerns what the codes should not include. The 7th column concerns the types of questions asked to respondents based on which codes were generated. These codes were generated following data extracted from questionnaire summarized in 7th column. For example, the first column (Code level 1) is made of 4 rows. The first two rows concern findings from the literature review and the last two rows concern empirical data from the fieldwork. Both data from literature review and empirical data from the fieldwork were combined to come up with findings based on which an interpretation was made. These codes allowed the researchers to give meaningful findings which in return facilitated researchers to provided a consolidated interpretation. The data generated aligned to epistemological interpretivism and they concern views from respondents on socio-cultural narratives and emotional experiences that the they have endured in their lives. The data collection was conducted in three rural districts of Nyaruguru (southern province), Rubavu and Rutsiro (western Province) and in three urban districts of Nyarugenge, Kicukiro and Gasabo (Kigali City). The justification for the three rural and three urban districts was to find out if there were divergent socio-cultural realities within each and across the diverse settings. The selected rural sites were those near protected areas from where the Batwa were the subjects of eviction following the legislation of protected areas in 1930 by colonial authorities. The urban districts were the sites in which some Batwa had lived after the imposition of a new lifestyle which differs from their hunting and gathering tradition following their eviction from forests. The study sites were purposively selected through the facilitation of gatekeepers namely, local entities. Authorization was sent to the district level which subsequently allowed a team of researchers to approach the sector, the cell and the village levels of administration. At the village level, which is the lowest entity where households of HMP live, respondents were again identified through the help of the Chief of the Village (umudugudu) who served as a gatekeeper.Focus Group Discussions (FGDs) along with direct observation were administered to the members of HMP (formerly referred to as Batwa). The groups comprised individuals who were above the age of 18 years, and were deemed to have experienced hardship as result of socio-economic vulnerability resulting from forest eviction. In-depth interviews were also carried out with officials of selected public institutions, including officials from the National Commission of Unity and Reconciliation and the National Commission of Human Rights. Key informants’ interviews (KIIs) were administered to leaders from NGOs and cooperative societies working towards the promotion of the rights of HMPs. These included one top manager and another who used to among the top managers of Cooperative des Potiers au Rwanda (COPORWA), a local NGO advocating for the rights of Batwa in Rwanda as well as one person who used to be among the leaders of CAURWA (Communauté des Autochtones au Rwanda, translated as Community of Autochthonies in Rwanda). The latter was also among one of the founding pioneers of a local NGO advocating for the rights of the Batwa in Rwanda. A former representative of HMP in Rwanda’s Senate was also contacted for an in-depth interview.All respondents were purposively selected due to their expertise or lived experience on the subject of self-identity and non-discrimination. Key informants from COPORWA, and a representative of the HMP in the Rwandan Senate and authorities from the government were to provide information on convergences or divergences on the phenomenon under investigation.In total, 226 respondents divided into four categories were approached for feedback. These were 220 heads of households from HMP for FGDs and direct observation; 3 leaders from COPORWA for in-depth interviews; 1 ex-Senator representing HMP in Rwanda Senate for an in-depth interview including 2 authorities from governmental institutions. The aim of using different tools for different respondents was to not only get a wide range of perceptions on the subject matter of self-identity and non-discrimination under investigation, but to enable the triangulation of information. FGDs along with direct observation facilitated the exploration of opinions and observation of behaviour and body language of the respondents when a sensitive issue, such as discrimination, was mentioned. As ethical consideration, all respondents were requested for their consent prior to data collection. All interviews were guided by the principle of ‘theoretical saturation’, which consists of administering inquiry until respondents start to repeat themselvesTo meet the reliability and validity of data, some measures were taken. Meetings were held every morning to plan for the day and every evening to evaluate the day spent in the field. For each day of data collection, the data collectors gave a daily report highlighting the progress made and any special information relating to the subject matter under investigation, which was observed from the field. The study used thematic analysis embedded in a deductive approach guided by the human rights-based approach in which two variables of self-identity and non-discrimination were the focus of study. The human rights-based approach facilitated generating data around themes related to self-identity and non-discrimination.In short, findings around the Batwa’s rights to self-identity and to non-discrimination indicated different information over the two variables. On the self-identity, findings indicated that the identity of the Batwa has been shifting because of socio-cultural dynamics affecting the contexts in which they find themselves and live. For example, the name “HMP” which conflate all vulnerable groups in Rwanda provides divergent views for respondents. For ordinary respondents from the Batwa, the name provides a negative profile while for the elites from Batwa the name means obscuring their problems since it disconnects from other indigenous people across Africa and the World. For respondents from the GoR, the name means upholding unit and reconciliation. Findings from the data indicated also that the identity Batwa has been characterised with negative profile of someone who is the poorest, dirty, indigent because of their lowest social status resulting from non-dominant context. This reality corroborates other recent studies that the identity of the Batwa does not have a fixed boundary.On the variable of non-discrimination, findings from the data indicated that negative profiles mentioned above are forms of indirect discrimination resulting from microaggressions and stereotypes. For further information how to use the dataset kindly contact the correspondent author at: ndikubwimana.genbattista@gmail.com, tel: (+250)788 751 225
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
It is overdue for political science to consider the names of nation-states, the discipline’s primary unit of analysis and the world’s largest, richest, and most powerful institutions. This research note begins such analysis by examining the descriptors used in formal country names including Empire, Kingdom, Islamic, Republic, Democratic, Socialist, and People’s. I analyze country names as independent variables, hypothesizing that they have value as signals of political characteristics. To test my hypotheses, I turn to the Varieties of Democracy dataset. I use fixed effects panel regressions to examine if countries’ descriptors correlate with the characteristics they name. I find that except for the democratic descriptor all others are surprisingly accurate. This is the first step towards developing an understanding of names in political science while adding a new tool for comparative politics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.
It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.
International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.
UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.
The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.
The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Open-access database of englacial temperature measurements compiled from data submissions and published literature. It is developed on GitHub and published to Zenodo.
The dataset adheres to the Frictionless Data Tabular Data Package specification.
The metadata in datapackage.json
describes, in detail, the contents of the tabular data files in the data
folder:
source.csv
: Description of each data source (either a personal communication or the reference to a published study).borehole.csv
: Description of each borehole (location, elevation, etc), linked to source.csv
via source_id
and less formally via source identifiers in notes
.profile.csv
: Description of each profile (date, etc), linked to borehole.csv
via borehole_id
and to source.csv
via source_id
and less formally via source identifiers in notes
.measurement.csv
: Description of each measurement (depth and temperature), linked to profile.csv
via borehole_id
and profile_id
.For boreholes with many profiles (e.g. from automated loggers), pairs of profile.csv
and measurement.csv
are stored separately in subfolders of data
named {source.id}-{glacier}
, where glacier
is a simplified and kebab-cased version of the glacier name (e.g. flowers2022-little-kluane
).
data/source.csv
Sources of information considered in the compilation of this database. Column names and categorical values closely follow the Citation Style Language (CSL) 1.0.2 specification. Names of people in non-Latin scripts are followed by a latinization in square brackets (e.g. В. С. Загороднов [V. S. Zagorodnov]) and non-English titles are followed by a translation in square brackets.
name | type | description |
---|---|---|
id (required) | string | Unique identifier constructed from the first author's lowercase, latinized, family name and the publication year, followed as needed by a lowercase letter to ensure uniqueness (e.g. Загороднов 1981 → zagorodnov1981a). |
author (required) | string | Author names (optionally followed by their ORCID in parentheses) as a pipe-delimited list. |
year (required) | year | Year of publication. |
type (required) | string | Item type. - article-journal: Journal article - book: Book (if the entire book is relevant) - chapter: Book section - document: Document not fitting into any other category - dataset: Collection of data - map: Geographic map - paper-conference: Paper published in conference proceedings - personal-communication: Personal communication between individuals - speech: Presentation (talk, poster) at a conference - report: Report distributed by an institution - thesis: Thesis written to satisfy degree requirements - webpage: Website or page on a website |
title | string | Item title. |
url | string | URL (DOI if available). |
language (required) | string | Language as ISO 639-1 two-letter language code. - de: German - en: English - fr: French - ko: Korean - ru: Russian - sv: Swedish - zh: Chinese |
container_title | string | Title of the container (e.g. journal, book). |
volume | integer | Volume number of the item or container. |
issue | string | Issue number (e.g. 1) or range (e.g. 1-2) of the item or container, with an optional letter prefix (e.g. F1). |
page | string | Page number (e.g. 1) or range (e.g. 1-2) of the item in the container. |
version | string | Version number (e.g. 1.0) of the item. |
editor | string | Editor names (e.g. of the containing book) as a pipe-delimited list. |
collection_title | string | Title of the collection (e.g. book series). |
collection_number | string | Number (e.g. 1) or range (e.g. 1-2) in the collection (e.g. book series volume). |
publisher | string | Publisher name. |
data/borehole.csv
Metadata about each borehole.
name | type | description |
---|---|---|
id (required) | integer | Unique identifier. |
source_id (required) | string | Identifier of the source of the earliest temperature measurements. This is also the source of the borehole attributes unless otherwise stated in notes . |
glacier_name (required) | string | Glacier or ice cap name (as reported). |
glims_id | string | Global Land Ice Measurements from Space (GLIMS) glacier identifier. |
location_origin (required) | string | Origin of location (latitude , longitude ).- submitted: Provided in data submission - published: Reported as coordinates in original publication - digitized: Digitized from published map with complete axes - estimated: Estimated from published plot by comparing to a map (e.g. Google Maps, CalTopo) - guessed: Estimated with difficulty, for example by comparing elevation to a map (e.g. Google Maps, CalTopo) |
latitude (required) | number [degree] | Latitude (EPSG 4326). |
longitude (required) | number [degree] | Longitude (EPSG 4326). |
elevation_origin (required) | string | Origin of elevation (elevation ).- submitted: Provided in data submission - published: Reported as number in original publication - digitized: Digitized from published plot with complete axes - estimated: Estimated from elevation contours in published map - guessed: Estimated with difficulty, for example by comparing location ( latitude , longitude ) to a map of contemporary elevations (e.g. CalTopo, Google Maps) |
elevation (required) | number [m] | Elevation above sea level. |
label | string | Borehole name (e.g. as labeled on a plot). |
date_min | date (%Y-%m-%d) | Begin date of drilling, or if not known precisely, the first possible date (e.g. 2019 → 2019-01-01). |
date_max | date (%Y-%m-%d) | End date of drilling, or if not known precisely, the last possible date (e.g. 2019 → 2019-12-31). |
drill_method | string | Drilling method. - mechanical: Push, percussion, rotary - thermal: Hot point, electrothermal, steam - combined: Mechanical and thermal |
ice_depth | number [m] | Starting depth of ice. Infinity (INF) indicates that ice was not reached. |
depth | number [m] | Total borehole depth (not including drilling in the underlying bed). |
to_bed | boolean | Whether the borehole reached the glacier bed. |
temperature_accuracy | number [°C] | Thermistor accuracy or precision (as reported). Typically understood to represent one standard deviation. |
notes | string | Additional remarks about the study site, the borehole, or the measurements therein. Souces are referenced by their id . |
curator | string | Names of people who added the data to the database, as a pipe-delimited list. |
data/profile.csv
Date and time of each measurement profile.
name | type | description |
---|---|---|
borehole_id (required) | integer | Borehole identifier. |
id (required) | integer | Borehole profile identifier (starting from 1 for each borehole). |
source_id (required) | string | Source identifier. |
measurement_origin (required) | string | Origin of measurements (measurement.depth , measurement.temperature ).- submitted: Provided as numbers in data submission - published: Numbers read from original publication - digitized: Digitized from published plot(s) with Plot Digitizer |
date_min | date (%Y-%m-%d) | Measurement date, or if not known precisely, the first possible date (e.g. 2019 → 2019-01-01). |
date_max
|
All the data for this dataset is provided from CARMA: Data from CARMA (www.carma.org) This dataset provides information about Power Plant emissions in Finland. Power Plant emissions from all power plants in Finland were obtained by CARMA for the past (2000 Annual Report), the present (2007 data), and the future. CARMA determine data presented for the future to reflect planned plant construction, expansion, and retirement. The dataset provides the name, company, parent company, city, state, zip, county, metro area, lat/lon, and plant id for each individual power plant. The dataset reports for the three time periods: Intensity: Pounds of CO2 emitted per megawatt-hour of electricity produced. Energy: Annual megawatt-hours of electricity produced. Carbon: Annual carbon dioxide (CO2) emissions. The units are short or U.S. tons. Multiply by 0.907 to get metric tons. Carbon Monitoring for Action (CARMA) is a massive database containing information on the carbon emissions of over 50,000 power plants and 4,000 power companies worldwide. Power generation accounts for 40% of all carbon emissions in the United States and about one-quarter of global emissions. CARMA is the first global inventory of a major, sector of the economy. The objective of CARMA.org is to equip individuals with the information they need to forge a cleaner, low-carbon future. By providing complete information for both clean and dirty power producers, CARMA hopes to influence the opinions and decisions of consumers, investors, shareholders, managers, workers, activists, and policymakers. CARMA builds on experience with public information disclosure techniques that have proven successful in reducing traditional pollutants. Please see carma.org for more information
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time
This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.
This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years
This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years
If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov