100+ datasets found
  1. Baby Names by Year

    • kaggle.com
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Baby Names by Year [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-baby-names-by-year-of-birth/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About this dataset

    This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time

    How to use the dataset

    How to use the US Baby Names by Year of Birth dataset:

    This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.

    This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years

    Research Ideas

    This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years

    Columns

    • index: the index of the dataframe
    • YearOfBirth: the year in which the baby was born
    • Name: the name of the baby
    • Sex: the sex of the baby
    • Number: the number of babies with that name and sex

    Acknowledgements

    If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov

    Data Source

  2. Gender by Name (Time-series)

    • kaggle.com
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Gender by Name (Time-series) [Dataset]. https://www.kaggle.com/datasets/thedevastator/automated-gender-identification-using-name-proba/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Automated Gender Identification Using Name Probabilities

    2019 US Social Security Administration Data

    By Derek Howard [source]

    About this dataset

    This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.

    To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).

    In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
    Good luck!

    Research Ideas

    • Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.

    • Generate gender neutral names - use this data to generate random names with no gender bias.

    • Automate record lookup - quickly and accurately assign genders based on the probability associated with their name

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.

  3. Z

    Global Country Information 2023

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elgiriyewithana, Nidula (2024). Global Country Information 2023 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8165228
    Explore at:
    Dataset updated
    Jun 15, 2024
    Dataset authored and provided by
    Elgiriyewithana, Nidula
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    Key Features

    Country: Name of the country.

    Density (P/Km2): Population density measured in persons per square kilometer.

    Abbreviation: Abbreviation or code representing the country.

    Agricultural Land (%): Percentage of land area used for agricultural purposes.

    Land Area (Km2): Total land area of the country in square kilometers.

    Armed Forces Size: Size of the armed forces in the country.

    Birth Rate: Number of births per 1,000 population per year.

    Calling Code: International calling code for the country.

    Capital/Major City: Name of the capital or major city.

    CO2 Emissions: Carbon dioxide emissions in tons.

    CPI: Consumer Price Index, a measure of inflation and purchasing power.

    CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.

    Currency_Code: Currency code used in the country.

    Fertility Rate: Average number of children born to a woman during her lifetime.

    Forested Area (%): Percentage of land area covered by forests.

    Gasoline_Price: Price of gasoline per liter in local currency.

    GDP: Gross Domestic Product, the total value of goods and services produced in the country.

    Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.

    Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.

    Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.

    Largest City: Name of the country's largest city.

    Life Expectancy: Average number of years a newborn is expected to live.

    Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.

    Minimum Wage: Minimum wage level in local currency.

    Official Language: Official language(s) spoken in the country.

    Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.

    Physicians per Thousand: Number of physicians per thousand people.

    Population: Total population of the country.

    Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.

    Tax Revenue (%): Tax revenue as a percentage of GDP.

    Total Tax Rate: Overall tax burden as a percentage of commercial profits.

    Unemployment Rate: Percentage of the labor force that is unemployed.

    Urban Population: Percentage of the population living in urban areas.

    Latitude: Latitude coordinate of the country's location.

    Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    Analyze population density and land area to study spatial distribution patterns.

    Investigate the relationship between agricultural land and food security.

    Examine carbon dioxide emissions and their impact on climate change.

    Explore correlations between economic indicators such as GDP and various socio-economic factors.

    Investigate educational enrollment rates and their implications for human capital development.

    Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.

    Study labor market dynamics through indicators such as labor force participation and unemployment rates.

    Investigate the role of taxation and its impact on economic development.

    Explore urbanization trends and their social and environmental consequences.

  4. w

    COVID-19 High Frequency Phone Survey of Households 2020 - World Bank LSMS...

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Oct 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Agency of Ethiopia (2021). COVID-19 High Frequency Phone Survey of Households 2020 - World Bank LSMS Harmonized Dataset - Ethiopia [Dataset]. https://microdata.worldbank.org/index.php/catalog/4072
    Explore at:
    Dataset updated
    Oct 25, 2021
    Dataset authored and provided by
    Central Statistics Agency of Ethiopia
    Time period covered
    2018 - 2021
    Area covered
    Ethiopia
    Description

    Abstract

    To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.

    The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.

    Two harmonized datafiles are prepared for each survey. The two datafiles are: 1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales. 2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.

    Geographic coverage

    National coverage

    Analysis unit

    • Households
    • Individuals

    Universe

    The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    See “Ethiopia - Socioeconomic Survey 2018-2019” and “Ethiopia - COVID-19 High Frequency Phone Survey of Households 2020” available in the Microdata Library for details.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Cleaning operations

    Ethiopia Socioeconomic Survey (ESS) 2018-2019 and Ethiopia COVID-19 High Frequency Phone Survey of Households (HFPS) 2020 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).

    The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.

    Response rate

    See “Ethiopia - Socioeconomic Survey 2018-2019” and “Ethiopia - COVID-19 High Frequency Phone Survey of Households 2020” available in the Microdata Library for details.

  5. d

    Johns Hopkins COVID-19 Case Tracker

    • data.world
    csv, zip
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Sep 26, 2025
    Authors
    The Associated Press
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    Updates

    • Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

    • April 9, 2020

      • The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.
    • April 20, 2020

      • Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.
    • April 29, 2020

      • The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
    • September 1st, 2020

      • Johns Hopkins is now providing counts for the five New York City counties individually.
    • February 12, 2021

      • The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."
      • Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.
    • February 16, 2021

      - Johns Hopkins has reconciled Ohio's historical deaths data with the state.

      Overview

    The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries

    Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Interactive

    The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

    @(https://datawrapper.dwcdn.net/nRyaf/15/)

    Interactive Embed Code

    <iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
    

    Caveats

    • This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.
    • In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.
    • In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"
    • This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.
    • Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
    • Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.
    • The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

    Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

    Attribution

    This data should be credited to Johns Hopkins University COVID-19 tracking project

  6. Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...

    • datarade.ai
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2024). Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat files available – 170M+, Verified Profiles - Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-us-premium-b2b-emails-phone-numbers-dataset-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    Area covered
    United States
    Description

    Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.

    Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.

    API Features:

    • Real-Time Updates: Our APIs deliver real-time updates, ensuring that the contact data your business relies on is always current and accurate.
    • High Volume Handling: Designed to support up to 860k API calls per day, our system is built for scalability and responsiveness, catering to enterprises of all sizes.
    • Flexible Integration: Easily integrate with CRM systems, marketing automation tools, and other enterprise applications to streamline your workflows and enhance productivity.

    Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.

    Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.

    Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.

    Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.

    Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.

    Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...

  7. World cities database

    • kaggle.com
    Updated May 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juanma Hernández (2025). World cities database [Dataset]. http://doi.org/10.34740/kaggle/dsv/11944536
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Juanma Hernández
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data is from:

    https://simplemaps.com/data/world-cities

    We're proud to offer a simple, accurate and up-to-date database of the world's cities and towns. We've built it from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA.

    Our database is:

    • Up-to-date: It was last refreshed on May 11, 2025.
    • Comprehensive: Over 4 million unique cities and towns from every country in the world (about 48 thousand in basic database).
    • Accurate: Cleaned and aggregated from official sources. Includes latitude and longitude coordinates.
    • Simple: A single CSV file, concise field names, only one entry per city.
  8. d

    CompanyData.com (BoldData) - Company Dataset of 6M IT companies worldwide

    • datarade.ai
    Updated Apr 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CompanyData.com (BoldData) (2021). CompanyData.com (BoldData) - Company Dataset of 6M IT companies worldwide [Dataset]. https://datarade.ai/data-products/list-of-6m-it-companies-worldwide-bolddata
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Apr 27, 2021
    Dataset authored and provided by
    CompanyData.com (BoldData)
    Area covered
    Libya, British Indian Ocean Territory, Maldives, Swaziland, New Zealand, Korea (Democratic People's Republic of), Algeria, Turks and Caicos Islands, Uruguay, Taiwan
    Description

    At CompanyData.com (BoldData), we provide verified company data sourced directly from official trade registers. Our global IT company dataset gives you access to 6 million IT businesses worldwide, including software firms, tech consultancies, system integrators, SaaS providers, and other IT service companies. Every record is sourced from authoritative local registries, ensuring unmatched accuracy, coverage, and compliance.

    This dataset is built for professionals who need reliable, structured insights into the global technology sector. Each company profile includes firmographic details such as legal entity name, registration number, business structure, size, revenue range, and industry classification (NACE/SIC). In addition, you'll find direct contact information for decision-makers—emails, mobile numbers, job titles, and department roles—helping you connect with the right people instantly.

    Whether you're validating suppliers for compliance, identifying high-potential leads for sales, enriching your CRM data, or building AI models with clean and segmented business intelligence, our IT dataset is designed to support a wide range of critical use cases. From global enterprises to fast-scaling startups, our data empowers businesses to move faster and smarter.

    We offer multiple delivery methods tailored to your needs. Choose from custom bulk files, access data through our self-service platform, integrate it directly into your systems via real-time API, or let us enrich your existing database with missing fields and decision-maker insights.

    With a database spanning 380 million companies globally, deep IT sector segmentation, and proven expertise in sourcing from local trade registers, CompanyData.com (BoldData) helps your team identify opportunities, ensure compliance, and scale efficiently—wherever your growth takes you.

  9. ReCANVo: A Dataset of Real-World Communicative and Affective Nonverbal...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaya Narain; Jaya Narain; Kristina Teresa Johnson; Kristina Teresa Johnson (2024). ReCANVo: A Dataset of Real-World Communicative and Affective Nonverbal Vocalizations [Dataset]. http://doi.org/10.5281/zenodo.5786860
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jaya Narain; Jaya Narain; Kristina Teresa Johnson; Kristina Teresa Johnson
    Description

    A dataset of 7077 labeled vocalizations made by non-speaking individuals. Each vocalization lasts approximately 0.5-4 seconds and is labeled with its affective or communicative meaning. Data were acquired in real-world settings (homes, schools, etc.) and were labeled in real-time by parents or caregivers who knew the non-speaking communicator well.

    dataset_file_directory.csv provides the name of each vocalization file, the corresponding participant ID, and the vocalization meaning or label (delighted, frustrated, request, etc.).

    If you use this dataset, please cite Johnson & Narain et al., "ReCANVo: A Database of Real-World Communicative and Affective Nonverbal Vocalizations". The authors are Jaya Narain, Kristina T. Johnson, Thomas Quatieri, Pattie Maes, and Rosalind Picard. This paper provides more information about the dataset, including data acquisition methodology, pre-processing procedures, and participant demographics.

    **J.N. and K.T.J. are joint first authors on this project. Please include both names in attribution when possible (e.g., Johnson & Narain et al.).

  10. Data from: Global Impacts Dataset of Invasive Alien Species (GIDIAS)

    • springernature.figshare.com
    xlsx
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Bacher; Ellen Ryan-Colton; Mario Coiro; Phillip Cassey; Bella S. Galil; Martín A. Nuñez; Michael Ansong; Katharina Dehnen-Schmutz; Georgi Fayvush; Romina Daiana Fernandez; Ankila Hiremath; Makihiko Ikegami; Angeliki F. Martinou; Shana M. McDermott; Cristina Preda; Montserrat Vilà; Olaf L. F. Weyl; Neelavara Ananthram Aravind; Katerina Athanasiou; Vidyadhar Atkore; Jacob N. Barney; Tim M. Blackburn; Eckehard G. Brockerhoff; Clinton Carbutt; Luca Carisio; Vanessa Céspedes; Diego F. Cisneros-Heredia; Meghan Cooling; Maarten de Groot; Jakovos Demetriou; James W. E. Dickey; Regan Early; Thomas G. Evans; Belinda Gallardo; Monica Gruber; Cang Hui; Jonathan Jeschke; Natalia Z. Joelson; Mohd Asgar Khan; Sabrina Kumschick; Lori Lach; Katharina Lapin; Simone Lioy; Chunlong Liu; Zoe J. MacMullen; Manuela A. Mazzitelli; G. John Measey; Agata A. Mrugała-Koese; Camille L. Musseau; Helen F. Nahrung; Alessia lucia Pepori; Luis R. Pertierra; Elizabeth F. Pienaar; Petr Pyšek; Gonzalo Rivas-Torres; Henry A. Rojas Martinez; JULISSA ROJAS-SANDOVAL; Ned Ryan-Schofield; Rocío M. Sánchez; Alberto Santini; Davide Santoro; Riccardo Scalera; Lisanna Schmidt; Tinyiko Cavin Shivambu; Sima Sohrabi; Elena Tricarico; Alejandro Trillo; Pieter G. van't Hof; Lara Volery; Tsungai A. Zengeya; Aikaterini Christopoulou; Virginia G. Duboscq-Carra; Ioanna A. Angelidou; Pilar Castro-Díez; Paola Tatiana Flores Males (2025). Global Impacts Dataset of Invasive Alien Species (GIDIAS) [Dataset]. http://doi.org/10.6084/m9.figshare.27908838.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sven Bacher; Ellen Ryan-Colton; Mario Coiro; Phillip Cassey; Bella S. Galil; Martín A. Nuñez; Michael Ansong; Katharina Dehnen-Schmutz; Georgi Fayvush; Romina Daiana Fernandez; Ankila Hiremath; Makihiko Ikegami; Angeliki F. Martinou; Shana M. McDermott; Cristina Preda; Montserrat Vilà; Olaf L. F. Weyl; Neelavara Ananthram Aravind; Katerina Athanasiou; Vidyadhar Atkore; Jacob N. Barney; Tim M. Blackburn; Eckehard G. Brockerhoff; Clinton Carbutt; Luca Carisio; Vanessa Céspedes; Diego F. Cisneros-Heredia; Meghan Cooling; Maarten de Groot; Jakovos Demetriou; James W. E. Dickey; Regan Early; Thomas G. Evans; Belinda Gallardo; Monica Gruber; Cang Hui; Jonathan Jeschke; Natalia Z. Joelson; Mohd Asgar Khan; Sabrina Kumschick; Lori Lach; Katharina Lapin; Simone Lioy; Chunlong Liu; Zoe J. MacMullen; Manuela A. Mazzitelli; G. John Measey; Agata A. Mrugała-Koese; Camille L. Musseau; Helen F. Nahrung; Alessia lucia Pepori; Luis R. Pertierra; Elizabeth F. Pienaar; Petr Pyšek; Gonzalo Rivas-Torres; Henry A. Rojas Martinez; JULISSA ROJAS-SANDOVAL; Ned Ryan-Schofield; Rocío M. Sánchez; Alberto Santini; Davide Santoro; Riccardo Scalera; Lisanna Schmidt; Tinyiko Cavin Shivambu; Sima Sohrabi; Elena Tricarico; Alejandro Trillo; Pieter G. van't Hof; Lara Volery; Tsungai A. Zengeya; Aikaterini Christopoulou; Virginia G. Duboscq-Carra; Ioanna A. Angelidou; Pilar Castro-Díez; Paola Tatiana Flores Males
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present the Global Impacts Dataset of Invasive Alien Species (GIDIAS), a global dataset of 22865 records including impacts of invasive alien species on nature, nature’s contributions to people, and good quality of life. Records include positive and negative impacts, neutral impacts (studies were carried out, but no impacts were documented), non-directional impacts (i.e., change without detriments or benefits for native species or people), and finally, some records of alien species where no studies were found that assessed their impacts (indicating data gaps). Records cover 3353 invasive alien species from all major taxa (plants, vertebrates, invertebrates, microorganisms) and all continents and realms (terrestrial, freshwater, marine). The data were compiled to serve as robust evidence for chapter 4 “Impacts of invasive alien species on nature, nature's contributions to people, and good quality of life” of the global assessment report on invasive alien species by the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES; available on Zenodo at https://doi.org/10.5281/zenodo.7430731). The dataset is provided in a machine-readable CSV file (file name GIDIAS_20250417_machine_read.csv), with special language characters retained where used (UTF-8 format). The dataset is also provided in Excel format (file name GIDIAS_20250417_Excel.xlsx). Metadata is provided in Excel format, including descriptors for each variable (file name GIDIAS_metadata_20250417.xlsx). Additional explanations for GIDIAS is stored in Microsoft Word format (docx) and contains (1) a short description of the principles of Environmental and Socio-Economic Impact Classification for Alien Taxa (EICAT, SEICAT), (2) a description of the variables included in the Global Impacts Dataset of Invasive Alien Species GIDIAS, and (3) a compilation of the search strategies and datasets included in the Global Impact Dataset of Invasive Alien Species (GIDIAS).

  11. w

    COVID-19 National Longitudinal Phone Survey 2020 – World Bank LSMS...

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Oct 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (NBS) (2021). COVID-19 National Longitudinal Phone Survey 2020 – World Bank LSMS Harmonized Dataset - Nigeria [Dataset]. https://microdata.worldbank.org/index.php/catalog/3856
    Explore at:
    Dataset updated
    Oct 25, 2021
    Dataset authored and provided by
    National Bureau of Statistics (NBS)
    Time period covered
    2018 - 2021
    Area covered
    Nigeria
    Description

    Abstract

    To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.

    The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.

    Two harmonized datafiles are prepared for each survey. The two datafiles are: 1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales.
    2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.

    Geographic coverage

    National coverage

    Analysis unit

    • Households
    • Individuals

    Universe

    The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Cleaning operations

    Nigeria General Household Survey, Panel (GHS-Panel) 2018-2019 and Nigeria COVID-19 National Longitudinal Phone Survey (COVID-19 NLPS) 2020 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).

    The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.

    Response rate

    See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.

  12. Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li; Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li (2023). Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction Models [Dataset]. http://doi.org/10.5281/zenodo.7909511
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li; Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Freebase is amongst the largest public cross-domain knowledge graphs. It possesses three main data modeling idiosyncrasies. It has a strong type system; its properties are purposefully represented in reverse pairs; and it uses mediator objects to represent multiary relationships. These design choices are important in modeling the real-world. But they also pose nontrivial challenges in research of embedding models for knowledge graph completion, especially when models are developed and evaluated agnostically of these idiosyncrasies. We make available several variants of the Freebase dataset by inclusion and exclusion of these data modeling idiosyncrasies. This is the first-ever publicly available full-scale Freebase dataset that has gone through proper preparation.

    Dataset Details

    The dataset consists of the four variants of Freebase dataset as well as related mapping/support files. For each variant, we made three kinds of files available:

    • Subject matter triples file
      • fb+/-CVT+/-REV One folder for each variant. In each folder there are 5 files: train.txt, valid.txt, test.txt, entity2id.txt, relation2id.txt Subject matter triples are the triples belong to subject matters domains—domains describing real-world facts.
        • Example of a row in train.txt, valid.txt, and test.txt:
          • 2, 192, 0
        • Example of a row in entity2id.txt:
          • /g/112yfy2xr, 2
        • Example of a row in relation2id.txt:
          • /music/album/release_type, 192
        • Explaination
          • "/g/112yfy2xr" and "/m/02lx2r" are the MID of the subject entity and object entity, respectively. "/music/album/release_type" is the realtionship between the two entities. 2, 192, and 0 are the IDs assigned by the authors to the objects.
    • Type system file
      • freebase_endtypes: Each row maps an edge type to its required subject type and object type.
        • Example
          • 92, 47178872, 90
        • Explanation
          • "92" and "90" are the type id of the subject and object which has the relationship id "47178872".
    • Metadata files
      • object_types: Each row maps the MID of a Freebase object to a type it belongs to.
        • Example
          • /g/11b41c22g, /type/object/type, /people/person
        • Explanation
          • The entity with MID "/g/11b41c22g" has a type "/people/person"
      • object_names: Each row maps the MID of a Freebase object to its textual label.
        • Example
          • /g/11b78qtr5m, /type/object/name, "Viroliano Tries Jazz"@en
        • Explanation
          • The entity with MID "/g/11b78qtr5m" has name "Viroliano Tries Jazz" in English.
      • object_ids: Each row maps the MID of a Freebase object to its user-friendly identifier.
        • Example
          • /m/05v3y9r, /type/object/id, "/music/live_album/concert"
        • Explanation
          • The entity with MID "/m/05v3y9r" can be interpreted by human as a music concert live album.
      • domains_id_label: Each row maps the MID of a Freebase domain to its label.
        • Example
          • /m/05v4pmy, geology, 77
        • Explanation
          • The object with MID "/m/05v4pmy" in Freebase is the domain "geology", and has id "77" in our dataset.
      • types_id_label: Each row maps the MID of a Freebase type to its label.
        • Example
          • /m/01xljxh, /government/political_party, 147
        • Explanation
          • The object with MID "/m/01xljxh" in Freebase is the type "/government/political_party", and has id "147" in our dataset.
      • entities_id_label: Each row maps the MID of a Freebase entity to its label.
        • Example
          • /g/11b78qtr5m, Viroliano Tries Jazz, 2234
        • Explanation
          • The entity with MID "/g/11b78qtr5m" in Freebase is "Viroliano Tries Jazz", and has id "2234" in our dataset.
        • properties_id_label: Each row maps the MID of a Freebase property to its label.
          • Example
            • /m/010h8tp2, /comedy/comedy_group/members, 47178867
          • Explanation
            • The object with MID "/m/010h8tp2" in Freebase is a property(relation/edge), it has label "/comedy/comedy_group/members" and has id "47178867" in our dataset.
        • uri_original2simplified and uri_simplified2original: The mapping between original URI and simplified URI and the mapping between simplified URI and original URI repectively.

  13. A synthetic data generation pipeline to reproducibly mirror high-resolution...

    • zenodo.org
    csv, txt, xls
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Frantzi; Maria Frantzi (2024). A synthetic data generation pipeline to reproducibly mirror high-resolution multi-variable peptidomics and real-patient clinical data [Dataset]. http://doi.org/10.1101/2024.10.30.24316342
    Explore at:
    csv, xls, txtAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maria Frantzi; Maria Frantzi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generating high quality, real-world clinical and molecular datasets is challenging, costly and time intensive. Consequently, such data should be shared with the scientific community, which however carries the risk of privacy breaches. The latter limitation hinders the scientific community’s ability to freely share and access high resolution and high quality data, which are essential especially in the context of personalised medicine.

    In this study, we present an algorithm based on Gaussian copulas to generate synthetic data that retain associations within high dimensional (peptidomics) datasets. For this purpose, 3,881 datasets from 10 cohorts were employed, containing clinical, demographic, molecular (> 21,500 peptide) variables, and outcome data for individuals with a kidney or a heart failure event. High dimensional copulas were developed to portray the distribution matrix between the clinical and peptidomics data in the dataset, and based on these distributions, a data matrix of 2,000 synthetic patients was developed. Synthetic data maintained the capacity to reproducibly correlate the peptidomics data with the clinical variables.

    External validation was performed, using independent multi-centric datasets (n = 2,964) of individuals with chronic kidney disease (CKD, defined as eGFR < 60 mL/min/1.73m²) or those with normal kidney function (eGFR > 90 mL/min/1.73m²). Similarly, the association of the rho-values of single peptides with eGFR between the synthetic and the external validation datasets was significantly reproduced (rho = 0.569, p = 1.8e-218). Subsequent development of classifiers by using the synthetic data matrices, resulted in highly predictive values in external real-patient datasets (AUC values of 0.803 and 0.867 for HF and CKD, respectively), demonstrating robustness of the developed method in the generation of synthetic patient data. The proposed pipeline represents a solution for high-dimensional sharing while maintaining patient confidentiality.

    For this study 6,967 peptidomics mass spectrometry datasets were employed and are deposited here, including:

    • 3,881 datasets that were employed for synthetic data generation

    1) File name: hf_peptides_data.csv; size: 45.56 MB; Description: 472 datasets from patients developing a heart failure event

    2) File name: ckd_peptides_data.csv; size: 10.98 MB; Description: 242 datasets from patients developing a kidney event

    3) File name: no_event_peptides_fdata.csv; size: 194.70 MB; Description: 3,266 datasets from patients that did not develop any event

    • 2,964 datasets that were used as external validation datasets (chronic kidney disease group

    *Study 1: PersTIgAN

    4) File name: PersTIgAN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 37.7MB; Description: Patients with CKD_Study1_export 1

    5) File name: PersTIgAN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 2.6 MB; Description: Patients with CKD_Study1_export 2

    *Study 2: CKD_Biobay

    6) File name: CKD_BioBay_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 35.7 MB; Description: Patients with CKD_Study2_export 1

    7) File name: CKD_BioBay_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_2.xls; size: 26.0 MB; Description: Patients with CKD_Study2_export 2

    *Study 3: DC_Ren
    8) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 37.96 MB; Description: Patients with CKD_Study3_export 1

    9) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_2.xls; size: 38.13 MB; Description: Patients with CKD_Study3_export 2

    10) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_3.xls; size: 36.86 MB; Description: Patients with CKD_Study3_export 3

    11) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_4.xls; size: 38.39 MB; Description: Patients with CKD_Study3_export 4

    12) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_5.xls; size: 38.12 MB; Description: Patients with CKD_Study3_export 5

    13) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_6.xls; size: 36.73 MB; Description: Patients with CKD_Study3_export 6

    14) File name: DCREN_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_1_Pivot_Blatt_7.xls; size: 2.15 MB; Description: Patients with CKD_Study3_export 7

    *Non-CKD

    15) File name: NonCKD_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_1.xls; size: 37.72 MB; Description: datasets from patients without CKD_export 1

    16) File name: NonCKD_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_2.xls; size: 38.31MB; Description: datasets from patients without CKD_export 2

    17) File name: NonCKD_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot_Blatt_3.xls; size: 36.95 MB; Description: datasets from patients without CKD_export 3

    • 122 datasets that were used as external validation datasets (heart failure group)

    7) File name: HF_external_case_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot.xls; size: 3.13 MB; Description: datasets from patients that develop heart failure

    8) File name: HF_external_Control_MosaID_1_7_5_MFinder_vs_MV_HybridSolution_v4_ML1_Pivot.xls; size: 3.94 MB; Description: datasets from patients that did not develop heart failure

  14. A long-term global population proportion with access to electricity dataset...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, tiff
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luling Liu; Luling Liu; Xin Cao; Xin Cao (2025). A long-term global population proportion with access to electricity dataset (SDG 7.1.1) from 1992 to 2022 based on nighttime light remote sensing [Dataset]. http://doi.org/10.5281/zenodo.14018079
    Explore at:
    tiff, bin, csvAvailable download formats
    Dataset updated
    May 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Luling Liu; Luling Liu; Xin Cao; Xin Cao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    In 2015, the United Nations established 17 Sustainable Development Goals (SDGs), with Goal 7 focusing on ensuring access to affordable, reliable, and sustainable modern energy for all by 2030. By 2022, approximately 760 million people, or 1 in 11 globally still lacked electricity access according to Tracking SDG7 :The Energy Progress Report 2022, posing significant challenges to achieving this goal. Traditional survey methods for estimating the proportion of people with electricity access are often costly, infrequently updated, and hindered by the need for interpolation of historical data.

    To address these challenges, this dataset employs a nighttime light remote sensing estimation framework that integrates DMSP-CCNL and NPP/VIIRS data with GlobPOP population data. This approach produces a global 0.1-degree grid and national-scale electricity access index (EAI) maps from 1992 to 2022.

    The framework results' correlation coefficient (R) with World Bank survey data from 1992 to 2022 is 0.87, and the RMSE is 15.4, demonstrating its reliability at the national level. By effectively capturing geospatial changes, this dataset supports SDG 7.1.1 monitoring and offers valuable insights for policymakers to address electricity access disparities and promote sustainable energy transitions.

    Data Description

    1. This dataset consists of 0.1-degree grid Electricity Access Index (EAI) data in GeoTIFF format, where each pixel value represents the proportion of the population with access to electricity within that area.

    Example Filename: EAI_0dot1_Deg_WGS84_F32_1992

    • Field 1: EAI (Proportion of people with access to electricity)
    • Field 2&3: Spatial resolution is 0.1 degree
    • Field 4: Coordinate system is WGS84
    • Field 5: Data type is F32 (Float32)
    • Field 6: Year "1992"

    2. Aggregated EAI data at the national scale is provided in both Shapefile and CSV formats:

    • Table Filename: EAI_Level_0_1992_2022.csv
      • Fields include:

        • SOC (Country code)
        • Name (Country name)
        • National EAI data from 1992 to 2022
    • Shape Filename: EAI_Level_0_1992_2022.shp
        • Boundary data sourced from GADM (Database of Global Administrative Areas)

    3. The pixel-level (30 arc-seconds) Electricity Accessed Population Density is provided in GeoTIFF format, as identified through nighttime light (NTL) data.

    Example Filename: Elec_PopDen_WGS84_30arc_F32_1992

    • Field 1 & 2: Population Density with access to electricity (per km^2)
    • Field 3: Coordinate system is WGS84
    • Field 4: Spatial resolution is 30 arc-seconds
    • Field 5: Data type is F32 (Float32)
    • Field 6: Year "1992"

    If you encounter any issues, please contact us via email at liu.luling.k2@s.mail.nagoya-u.ac.jp.

    More Information

    The source codes are publicly available at GitHub: https://github.com/lulingliu/EAI.

  15. High-Frequency Phone Survey on COVID-19 - World Bank LSMS Harmonized Dataset...

    • catalog.ihsn.org
    • microdata.worldbank.org
    Updated Jan 3, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malawi National Statistical Office (NSO) (2022). High-Frequency Phone Survey on COVID-19 - World Bank LSMS Harmonized Dataset - Malawi [Dataset]. https://catalog.ihsn.org/catalog/9901
    Explore at:
    Dataset updated
    Jan 3, 2022
    Dataset provided by
    National Statistical Office of Malawihttp://www.nsomalawi.mw/
    Authors
    Malawi National Statistical Office (NSO)
    Time period covered
    2019 - 2021
    Area covered
    Malawi
    Description

    Abstract

    To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.

    The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.

    Two harmonized datafiles are prepared for each survey. The two datafiles are: 1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales.
    2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.

    Geographic coverage

    National coverage

    Analysis unit

    • Households
    • Individuals

    Universe

    The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    See “Malawi - Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs)” and “Malawi - High-Frequency Phone Survey on COVID-19” available in the Microdata Library for details.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Cleaning operations

    Malawi Integrated Household Panel Survey (IHPS) 2019 and Malawi High-Frequency Phone Survey on COVID-19 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).

    The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.

    Response rate

    See “Malawi - Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs)” and “Malawi - High-Frequency Phone Survey on COVID-19” available in the Microdata Library for details.

  16. A dataset related to the Batwa’s Right to Recognition as a Minority and...

    • figshare.com
    xlsx
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ndikubwimana Jean Baptiste; Anangwe Kathleen A; Nyarwath Oriare; Mwimali Jack; Kabwete Charles (2023). A dataset related to the Batwa’s Right to Recognition as a Minority and Indigenous People in Rwanda [Dataset]. http://doi.org/10.6084/m9.figshare.24612147.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ndikubwimana Jean Baptiste; Anangwe Kathleen A; Nyarwath Oriare; Mwimali Jack; Kabwete Charles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Rwanda
    Description

    This codebook of data is related to the study conducted on the Batwa’s Rights to Recognition as a Minority and Indigenous People in Rwanda through the Lens of a Human Rights Based Approach. The dataset displays information in 7 columns. The first column is called Code level 1 which consists of the main code extracted from the findings, the second column is code level 2 which consists of sub-codes extracted from code level 1 and the third column is called code level 3 which is extracted from code level 2. The 4th column provides a snapshot of definition of the content of the codes. The column 5 concerns what the codes should include and the 6th column concerns what the codes should not include. The 7th column concerns the types of questions asked to respondents based on which codes were generated. These codes were generated following data extracted from questionnaire summarized in 7th column. For example, the first column (Code level 1) is made of 4 rows. The first two rows concern findings from the literature review and the last two rows concern empirical data from the fieldwork. Both data from literature review and empirical data from the fieldwork were combined to come up with findings based on which an interpretation was made. These codes allowed the researchers to give meaningful findings which in return facilitated researchers to provided a consolidated interpretation. The data generated aligned to epistemological interpretivism and they concern views from respondents on socio-cultural narratives and emotional experiences that the they have endured in their lives. The data collection was conducted in three rural districts of Nyaruguru (southern province), Rubavu and Rutsiro (western Province) and in three urban districts of Nyarugenge, Kicukiro and Gasabo (Kigali City). The justification for the three rural and three urban districts was to find out if there were divergent socio-cultural realities within each and across the diverse settings. The selected rural sites were those near protected areas from where the Batwa were the subjects of eviction following the legislation of protected areas in 1930 by colonial authorities. The urban districts were the sites in which some Batwa had lived after the imposition of a new lifestyle which differs from their hunting and gathering tradition following their eviction from forests. The study sites were purposively selected through the facilitation of gatekeepers namely, local entities. Authorization was sent to the district level which subsequently allowed a team of researchers to approach the sector, the cell and the village levels of administration. At the village level, which is the lowest entity where households of HMP live, respondents were again identified through the help of the Chief of the Village (umudugudu) who served as a gatekeeper.Focus Group Discussions (FGDs) along with direct observation were administered to the members of HMP (formerly referred to as Batwa). The groups comprised individuals who were above the age of 18 years, and were deemed to have experienced hardship as result of socio-economic vulnerability resulting from forest eviction. In-depth interviews were also carried out with officials of selected public institutions, including officials from the National Commission of Unity and Reconciliation and the National Commission of Human Rights. Key informants’ interviews (KIIs) were administered to leaders from NGOs and cooperative societies working towards the promotion of the rights of HMPs. These included one top manager and another who used to among the top managers of Cooperative des Potiers au Rwanda (COPORWA), a local NGO advocating for the rights of Batwa in Rwanda as well as one person who used to be among the leaders of CAURWA (Communauté des Autochtones au Rwanda, translated as Community of Autochthonies in Rwanda). The latter was also among one of the founding pioneers of a local NGO advocating for the rights of the Batwa in Rwanda. A former representative of HMP in Rwanda’s Senate was also contacted for an in-depth interview.All respondents were purposively selected due to their expertise or lived experience on the subject of self-identity and non-discrimination. Key informants from COPORWA, and a representative of the HMP in the Rwandan Senate and authorities from the government were to provide information on convergences or divergences on the phenomenon under investigation.In total, 226 respondents divided into four categories were approached for feedback. These were 220 heads of households from HMP for FGDs and direct observation; 3 leaders from COPORWA for in-depth interviews; 1 ex-Senator representing HMP in Rwanda Senate for an in-depth interview including 2 authorities from governmental institutions. The aim of using different tools for different respondents was to not only get a wide range of perceptions on the subject matter of self-identity and non-discrimination under investigation, but to enable the triangulation of information. FGDs along with direct observation facilitated the exploration of opinions and observation of behaviour and body language of the respondents when a sensitive issue, such as discrimination, was mentioned. As ethical consideration, all respondents were requested for their consent prior to data collection. All interviews were guided by the principle of ‘theoretical saturation’, which consists of administering inquiry until respondents start to repeat themselvesTo meet the reliability and validity of data, some measures were taken. Meetings were held every morning to plan for the day and every evening to evaluate the day spent in the field. For each day of data collection, the data collectors gave a daily report highlighting the progress made and any special information relating to the subject matter under investigation, which was observed from the field. The study used thematic analysis embedded in a deductive approach guided by the human rights-based approach in which two variables of self-identity and non-discrimination were the focus of study. The human rights-based approach facilitated generating data around themes related to self-identity and non-discrimination.In short, findings around the Batwa’s rights to self-identity and to non-discrimination indicated different information over the two variables. On the self-identity, findings indicated that the identity of the Batwa has been shifting because of socio-cultural dynamics affecting the contexts in which they find themselves and live. For example, the name “HMP” which conflate all vulnerable groups in Rwanda provides divergent views for respondents. For ordinary respondents from the Batwa, the name provides a negative profile while for the elites from Batwa the name means obscuring their problems since it disconnects from other indigenous people across Africa and the World. For respondents from the GoR, the name means upholding unit and reconciliation. Findings from the data indicated also that the identity Batwa has been characterised with negative profile of someone who is the poorest, dirty, indigent because of their lowest social status resulting from non-dominant context. This reality corroborates other recent studies that the identity of the Batwa does not have a fixed boundary.On the variable of non-discrimination, findings from the data indicated that negative profiles mentioned above are forms of indirect discrimination resulting from microaggressions and stereotypes. For further information how to use the dataset kindly contact the correspondent author at: ndikubwimana.genbattista@gmail.com, tel: (+250)788 751 225

  17. H

    Replication Data for: What’s in a Name? Towards the Study of Names in...

    • dataverse.harvard.edu
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Givens (2025). Replication Data for: What’s in a Name? Towards the Study of Names in Political Science [Dataset]. http://doi.org/10.7910/DVN/REJ38B
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    John Givens
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    It is overdue for political science to consider the names of nation-states, the discipline’s primary unit of analysis and the world’s largest, richest, and most powerful institutions. This research note begins such analysis by examining the descriptors used in formal country names including Empire, Kingdom, Islamic, Republic, Democratic, Socialist, and People’s. I analyze country names as independent variables, hypothesizing that they have value as signals of political characteristics. To test my hypotheses, I turn to the Varieties of Democracy dataset. I use fixed effects panel regressions to examine if countries’ descriptors correlate with the characteristics they name. I find that except for the democratic descriptor all others are surprisingly accurate. This is the first step towards developing an understanding of names in political science while adding a new tool for comparative politics.

  18. GBIF Backbone Taxonomy

    • smng.net
    • data.zse.pensoft.net
    • +6more
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GBIF Secretariat (2023). GBIF Backbone Taxonomy [Dataset]. http://doi.org/10.15468/39omei
    Explore at:
    Dataset updated
    Nov 17, 2023
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.

    It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.

    International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.

    UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.

    The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.

    The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:

    • Catalogue of Life Checklist - 4766428 names
    • International Barcode of Life project (iBOL) Barcode Index Numbers (BINs) - 635951 names
    • UNITE - Unified system for the DNA based fungal species linked to the classification - 611208 names
    • The Paleobiology Database - 212054 names
    • World Register of Marine Species - 188857 names
    • The Interim Register of Marine and Nonmarine Genera - 183894 names
    • The World Checklist of Vascular Plants (WCVP) - 131891 names
    • GBIF Backbone Taxonomy - 114350 names
    • TAXREF - 109374 names
    • The Leipzig catalogue of vascular plants - 75380 names
    • ZooBank - 73549 names
    • Integrated Taxonomic Information System (ITIS) - 68377 names
    • Plazi.org taxonomic treatments database - 61346 names
    • Genome Taxonomy Database r207 - 60545 names
    • International Plant Names Index - 52329 names
    • Fauna Europaea - 45077 names
    • The National Checklist of Taiwan (Catalogue of Life in Taiwan, TaiCoL) - 36193 names
    • Dyntaxa. Svensk taxonomisk databas - 35892 names
    • The Plant List with literature - 32692 names
    • United Kingdom Species Inventory (UKSI) - 29643 names
    • Artsnavnebasen - 29208 names
    • The IUCN Red List of Threatened Species - 21221 names
    • Afromoths, online database of Afrotropical moth species (Lepidoptera) - 13961 names
    • Brazilian Flora 2020 project - Projeto Flora do Brasil 2020 - 13829 names
    • Prokaryotic Nomenclature Up-to-Date (PNU) - 10079 names
    • Checklist Dutch Species Register - Nederlands Soortenregister - 8814 names
    • ICTV Master Species List (MSL) - 7852 names
    • Cockroach Species File - 6020 names
    • GRIN Taxonomy - 5882 names
    • Taxon list of fungi and fungal-like organisms from Germany compiled by the DGfM - 4570 names
    • Catalogue of Afrotropical Bees - 3623 names
    • Catalogue of Tenebrionidae (Coleoptera) of North America - 3327 names
    • Checklist of Beetles (Coleoptera) of Canada and Alaska. Second Edition. - 3312 names
    • Systema Dipterorum - 2850 names
    • Catalogue of the Pterophoroidea of the World - 2807 names
    • The Clements Checklist - 2675 names
    • Taxon list of Hymenoptera from Germany compiled in the context of the GBOL project - 2496 names
    • IOC World Bird List, v13.2 - 2366 names
    • Official Lists and Indexes of Names in Zoology - 2310 names
    • National checklist of all species occurring in Denmark - 1922 names
    • Myriatrix - 1876 names
    • Database of Vascular Plants of Canada (VASCAN) - 1822 names
    • Taxon list of vascular plants from Bavaria, Germany compiled in the context of the BFL project - 1771 names
    • Orthoptera Species File - 1742 names
    • A list of the terrestrial fungi, flora and fauna of Madeira and Selvagens archipelagos - 1602 names
    • Aphid Species File - 1565 names
    • World Spider Catalog - 1561 names
    • Taxon list of Jurassic Pisces of the Tethys Palaeo-Environment compiled at the SNSB-JME - 1270 names
    • Backbone Family Classification Patch - 1143 names
    • GBIF Algae Classification - 1100 names
    • International Cichorieae Network (ICN): Cichorieae Portal - 975 names
    • Psocodea Species File - 803 names
    • New Zealand Marine Macroalgae Species Checklist - 787 names
    • Annotated checklist of endemic species from the Western Balkans - 754 names
    • Taxon list of animals with German names (worldwide) compiled at the SMNS - 503 names
    • Catalogue of the Alucitoidea of the World - 472 names
    • Lygaeoidea Species File - 462 names
    • Catálogo de Plantas y Líquenes de Colombia - 422 names
    • GBIF Backbone Patch - 317 names
    • Phasmida Species File - 259 names
    • Cortinariaceae fetched from the Index Fungorum API - 234 names
    • Coreoidea Species File - 233 names
    • GTDB supplement - 139 names
    • Mantodea Species File - 119 names
    • Endemic species in Taiwan - 93 names
    • Taxon list of Araneae from Germany compiled in the context of the GBOL project - 88 names
    • Species of Hominidae - 78 names
    • Taxon list of Sternorrhyncha from Germany compiled in the context of the GBOL project - 77 names
    • Taxon list of mosses from Germany compiled in the context of the GBOL project - 75 names
    • Mammal Species of the World - 73 names
    • Plecoptera Species File - 71 names
    • Species Fungorum Plus - 64 names
    • Catalogue of the type specimens of Cosmopterigidae (Lepidoptera: Gelechioidea) from research collections of the Zoological Institute, Russian Academy of Sciences - 47 names
    • Species named after famous people - 41 names
    • Dermaptera Species File - 36 names
    • Taxon list of Trichoptera from Germany compiled in the context of the GBOL project - 34 names
    • True Fruit Flies (Diptera, Tephritidae) of the Afrotropical Region - 33 names
    • Range and Regularities in the Distribution of Earthworms of the Earthworms of the USSR Fauna. Perel, 1979 - 32 names
    • Taxon list of Diplura from Germany compiled in the context of the GBOL project - 30 names
    • Lista de referencia de especies de aves de Colombia - 2022 - 24 names
    • Taxon list of Auchenorrhyncha from Germany compiled in the context of the GBOL project - 20 names
    • Catalogue of the type specimens of Polycestinae (Coleoptera: Buprestidae) from research collections of the Zoological Institute, Russian Academy of Sciences - 19 names
    • Taxon list of Thysanoptera from Germany compiled in the context of the GBOL project - 19 names
    • Lista de especies de vertebrados registrados en jurisdicción del Departamento del Huila - 18 names
    • Taxon list of Microcoryphia (Archaeognatha) from Germany compiled in the context of the GBOL project - 15 names
    • Catalogue of the type specimens of Bufonidae and Megophryidae (Amphibia: Anura) from research collections of the Zoological Institute, Russian Academy of Sciences - 12 names
    • Grylloblattodea Species File - 11 names
    • Coleorrhyncha Species File - 9 names
    • Taxon list of liverworts from Germany compiled in the context of the GBOL project - 9 names
    • Embioptera Species File - 7 names
    • Taxon list of Pisces and Cyclostoma from Germany compiled in the context of the GBOL project - 6 names
    • Taxon list of Pteridophyta from Germany compiled in the context of the GBOL project - 6 names
    • Taxon list of Siphonaptera from Germany compiled in the context of the GBOL project - 5 names
    • The Earthworms of the Fauna of Russia. Perel, 1997 - 5 names
    • Taxon list of Zygentoma from Germany compiled in the context of the GBOL project - 4 names
    • Asiloid Flies: new taxa of Diptera: Apioceridae, Asilidae, and Mydidae - 3 names
    • Taxon list of Protura from Germany compiled in the context of the GBOL project - 3 names
    • Taxon list of hornworts from Germany compiled in the context of the GBOL project - 2 names
    • Chrysididae Species File - 1 names
    • Taxon list of Dermaptera from Germany compiled in the context of the GBOL project - 1 names
    • Taxon list of Diplopoda from Germany in the context of the GBOL project - 1 names
    • Taxon list of Orthoptera (Grashoppers) from Germany compiled at the SNSB - 1 names
    • Taxon list of Pscoptera from Germany compiled in the context of the GBOL project - 1 names
    • Taxon list of Pseudoscorpiones from Germany compiled in the context of the GBOL project - 1 names
    • Taxon list of Raphidioptera from Germany compiled in the context of the GBOL project - 1 names

  19. glenglat: Global englacial temperature database

    • zenodo.org
    zip
    Updated Aug 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mylène Jacquemart; Mylène Jacquemart; Ethan Welty; Ethan Welty (2024). glenglat: Global englacial temperature database [Dataset]. http://doi.org/10.5281/zenodo.13334175
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mylène Jacquemart; Mylène Jacquemart; Ethan Welty; Ethan Welty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 20, 1842 - Jul 27, 2023
    Description

    Open-access database of englacial temperature measurements compiled from data submissions and published literature. It is developed on GitHub and published to Zenodo.

    Data structure

    The dataset adheres to the Frictionless Data Tabular Data Package specification. The metadata in datapackage.json describes, in detail, the contents of the tabular data files in the data folder:

    • source.csv: Description of each data source (either a personal communication or the reference to a published study).
    • borehole.csv: Description of each borehole (location, elevation, etc), linked to source.csv via source_id and less formally via source identifiers in notes.
    • profile.csv: Description of each profile (date, etc), linked to borehole.csv via borehole_id and to source.csv via source_id and less formally via source identifiers in notes.
    • measurement.csv: Description of each measurement (depth and temperature), linked to profile.csv via borehole_id and profile_id.

    For boreholes with many profiles (e.g. from automated loggers), pairs of profile.csv and measurement.csv are stored separately in subfolders of data named {source.id}-{glacier}, where glacier is a simplified and kebab-cased version of the glacier name (e.g. flowers2022-little-kluane).

    data/source.csv

    Sources of information considered in the compilation of this database. Column names and categorical values closely follow the Citation Style Language (CSL) 1.0.2 specification. Names of people in non-Latin scripts are followed by a latinization in square brackets (e.g. В. С. Загороднов [V. S. Zagorodnov]) and non-English titles are followed by a translation in square brackets.

    nametypedescription
    id (required)stringUnique identifier constructed from the first author's lowercase, latinized, family name and the publication year, followed as needed by a lowercase letter to ensure uniqueness (e.g. Загороднов 1981 → zagorodnov1981a).
    author (required)stringAuthor names (optionally followed by their ORCID in parentheses) as a pipe-delimited list.
    year (required)yearYear of publication.
    type (required)stringItem type.
    - article-journal: Journal article
    - book: Book (if the entire book is relevant)
    - chapter: Book section
    - document: Document not fitting into any other category
    - dataset: Collection of data
    - map: Geographic map
    - paper-conference: Paper published in conference proceedings
    - personal-communication: Personal communication between individuals
    - speech: Presentation (talk, poster) at a conference
    - report: Report distributed by an institution
    - thesis: Thesis written to satisfy degree requirements
    - webpage: Website or page on a website
    titlestringItem title.
    urlstringURL (DOI if available).
    language (required)stringLanguage as ISO 639-1 two-letter language code.
    - de: German
    - en: English
    - fr: French
    - ko: Korean
    - ru: Russian
    - sv: Swedish
    - zh: Chinese
    container_titlestringTitle of the container (e.g. journal, book).
    volumeintegerVolume number of the item or container.
    issuestringIssue number (e.g. 1) or range (e.g. 1-2) of the item or container, with an optional letter prefix (e.g. F1).
    pagestringPage number (e.g. 1) or range (e.g. 1-2) of the item in the container.
    versionstringVersion number (e.g. 1.0) of the item.
    editorstringEditor names (e.g. of the containing book) as a pipe-delimited list.
    collection_titlestringTitle of the collection (e.g. book series).
    collection_numberstringNumber (e.g. 1) or range (e.g. 1-2) in the collection (e.g. book series volume).
    publisherstringPublisher name.

    data/borehole.csv

    Metadata about each borehole.

    nametypedescription
    id (required)integerUnique identifier.
    source_id (required)stringIdentifier of the source of the earliest temperature measurements. This is also the source of the borehole attributes unless otherwise stated in notes.
    glacier_name (required)stringGlacier or ice cap name (as reported).
    glims_idstringGlobal Land Ice Measurements from Space (GLIMS) glacier identifier.
    location_origin (required)stringOrigin of location (latitude, longitude).
    - submitted: Provided in data submission
    - published: Reported as coordinates in original publication
    - digitized: Digitized from published map with complete axes
    - estimated: Estimated from published plot by comparing to a map (e.g. Google Maps, CalTopo)
    - guessed: Estimated with difficulty, for example by comparing elevation to a map (e.g. Google Maps, CalTopo)
    latitude (required)number [degree]Latitude (EPSG 4326).
    longitude (required)number [degree]Longitude (EPSG 4326).
    elevation_origin (required)stringOrigin of elevation (elevation).
    - submitted: Provided in data submission
    - published: Reported as number in original publication
    - digitized: Digitized from published plot with complete axes
    - estimated: Estimated from elevation contours in published map
    - guessed: Estimated with difficulty, for example by comparing location (latitude, longitude) to a map of contemporary elevations (e.g. CalTopo, Google Maps)
    elevation (required)number [m]Elevation above sea level.
    labelstringBorehole name (e.g. as labeled on a plot).
    date_mindate (%Y-%m-%d)Begin date of drilling, or if not known precisely, the first possible date (e.g. 2019 → 2019-01-01).
    date_maxdate (%Y-%m-%d)End date of drilling, or if not known precisely, the last possible date (e.g. 2019 → 2019-12-31).
    drill_methodstringDrilling method.
    - mechanical: Push, percussion, rotary
    - thermal: Hot point, electrothermal, steam
    - combined: Mechanical and thermal
    ice_depthnumber [m]Starting depth of ice. Infinity (INF) indicates that ice was not reached.
    depthnumber [m]Total borehole depth (not including drilling in the underlying bed).
    to_bedbooleanWhether the borehole reached the glacier bed.
    temperature_accuracynumber [°C]Thermistor accuracy or precision (as reported). Typically understood to represent one standard deviation.
    notesstringAdditional remarks about the study site, the borehole, or the measurements therein. Souces are referenced by their id.
    curatorstringNames of people who added the data to the database, as a pipe-delimited list.

    data/profile.csv

    Date and time of each measurement profile.

    nametypedescription
    borehole_id (required)integerBorehole identifier.
    id (required)integerBorehole profile identifier (starting from 1 for each borehole).
    source_id (required)stringSource identifier.
    measurement_origin (required)stringOrigin of measurements (measurement.depth, measurement.temperature).
    - submitted: Provided as numbers in data submission
    - published: Numbers read from original publication
    - digitized: Digitized from published plot(s) with Plot Digitizer
    date_mindate (%Y-%m-%d)Measurement date, or if not known precisely, the first possible date (e.g. 2019 → 2019-01-01).
    date_max

  20. g

    CARMA, Finland Power Plant Emissions, Finland, 2000/2007/Future

    • geocommons.com
    Updated May 6, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CARMA (2008). CARMA, Finland Power Plant Emissions, Finland, 2000/2007/Future [Dataset]. http://geocommons.com/search.html
    Explore at:
    Dataset updated
    May 6, 2008
    Dataset provided by
    CARMA
    data
    Description

    All the data for this dataset is provided from CARMA: Data from CARMA (www.carma.org) This dataset provides information about Power Plant emissions in Finland. Power Plant emissions from all power plants in Finland were obtained by CARMA for the past (2000 Annual Report), the present (2007 data), and the future. CARMA determine data presented for the future to reflect planned plant construction, expansion, and retirement. The dataset provides the name, company, parent company, city, state, zip, county, metro area, lat/lon, and plant id for each individual power plant. The dataset reports for the three time periods: Intensity: Pounds of CO2 emitted per megawatt-hour of electricity produced. Energy: Annual megawatt-hours of electricity produced. Carbon: Annual carbon dioxide (CO2) emissions. The units are short or U.S. tons. Multiply by 0.907 to get metric tons. Carbon Monitoring for Action (CARMA) is a massive database containing information on the carbon emissions of over 50,000 power plants and 4,000 power companies worldwide. Power generation accounts for 40% of all carbon emissions in the United States and about one-quarter of global emissions. CARMA is the first global inventory of a major, sector of the economy. The objective of CARMA.org is to equip individuals with the information they need to forge a cleaner, low-carbon future. By providing complete information for both clean and dirty power producers, CARMA hopes to influence the opinions and decisions of consumers, investors, shareholders, managers, workers, activists, and policymakers. CARMA builds on experience with public information disclosure techniques that have proven successful in reducing traditional pollutants. Please see carma.org for more information

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2022). Baby Names by Year [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-baby-names-by-year-of-birth/discussion
Organization logo

Baby Names by Year

Baby names by year of birth

Explore at:
14 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

About this dataset

This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time

How to use the dataset

How to use the US Baby Names by Year of Birth dataset:

This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.

This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years

Research Ideas

This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years

Columns

  • index: the index of the dataframe
  • YearOfBirth: the year in which the baby was born
  • Name: the name of the baby
  • Sex: the sex of the baby
  • Number: the number of babies with that name and sex

Acknowledgements

If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov

Data Source

Search
Clear search
Close search
Google apps
Main menu