100+ datasets found
  1. USA Name Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

    Content

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

    All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

    https://cloud.google.com/bigquery/public-data/usa-names

    Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @dcp from Unplash.

    Inspiration

    What are the most common names?

    What are the most common female names?

    Are there more female or male names?

    Female names by a wide margin?

  2. f

    Namesakes

    • figshare.com
    json
    Updated Nov 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleg Vasilyev; Aysu Altun; Nidhi Vyas; Vedant Dharnidharka; Erika Lampert; John Bohannon (2021). Namesakes [Dataset]. http://doi.org/10.6084/m9.figshare.17009105.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 20, 2021
    Dataset provided by
    figshare
    Authors
    Oleg Vasilyev; Aysu Altun; Nidhi Vyas; Vedant Dharnidharka; Erika Lampert; John Bohannon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    Motivation: creating challenging dataset for testing Named-Entity
    

    Linking. The Namesakes dataset consists of three closely related datasets: Entities, News and Backlinks. Entities were collected as Wikipedia text chunks corresponding to highly ambiguous entity names. The News were collected as random news text chunks, containing mentions that either belong to the Entities dataset or can be easily confused with them. Backlinks were obtained from Wikipedia dump data with intention to have mentions linked to the entities of the Entity dataset. The Entities and News are human-labeled, resolving the mentions of the entities.Methods

    Entities were collected as Wikipedia 
    

    text chunks corresponding to highly ambiguous entity names: the most popular people names, the most popular locations, and organizations with name ambiguity. In each Entities text chunk, the named entities with the name similar to the chunk Wikipedia page name are labeled. For labeling, these entities were suggested to human annotators (odetta.ai) to tag as "Same" (same as the page entity) or "Other". The labeling was done by 6 experienced annotators that passed through a preliminary trial task. The only accepted tags are the tags assigned in agreement by not less than 5 annotators, and then passed through reconciliation with an experienced reconciliator.

    The News were collected as random news text chunks, containing mentions which either belong to the Entities dataset or can be easily confused with them. In each News text chunk one mention was selected for labeling, and 3-10 Wikipedia pages from Entities were suggested as the labels for an annotator to choose from. The labeling was done by 3 experienced annotators (odetta.ai), after the annotators passed a preliminary trial task. The results were reconciled by an experienced reconciliator. All the labeling was done using Lighttag (lighttag.io).

    Backlinks were obtained from Wikipedia dump data (dumps.wikimedia.org/enwiki/20210701) with intention to have mentions linked to the entities of the Entity dataset. The backlinks were filtered to leave only mentions in a good quality text; each text was cut 1000 characters after the last mention.

    Usage NotesEntities:
    

    File: Namesakes_entities.jsonl The Entities dataset consists of 4148 Wikipedia text chunks containing human-tagged mentions of entities. Each mention is tagged either as "Same" (meaning that the mention is of this Wikipedia page entity), or "Other" (meaning that the mention is of some other entity, just having the same or similar name). The Entities dataset is a jsonl list, each item is a dictionary with the following keys and values: Key: ‘pagename’: page name of the Wikipedia page. Key ‘pageid’: page id of the Wikipedia page. Key ‘title’: title of the Wikipedia page. Key ‘url’: URL of the Wikipedia page. Key ‘text’: The text chunk from the Wikipedia page. Key ‘entities’: list of the mentions in the page text, each entity is represented by a dictionary with the keys: Key 'text': the mention as a string from the page text. Key ‘start’: start character position of the entity in the text. Key ‘end’: end (one-past-last) character position of the entity in the text. Key ‘tag’: annotation tag given as a string - either ‘Same’ or ‘Other’.

    News: File: Namesakes_news.jsonl The News dataset consists of 1000 news text chunks, each one with a single annotated entity mention. The annotation either points to the corresponding entity from the Entities dataset (if the mention is of that entity), or indicates that the mentioned entity does not belong to the Entities dataset. The News dataset is a jsonl list, each item is a dictionary with the following keys and values: Key ‘id_text’: Id of the sample. Key ‘text’: The text chunk. Key ‘urls’: List of URLs of wikipedia entities suggested to labelers for identification of the entity mentioned in the text. Key ‘entity’: a dictionary describing the annotated entity mention in the text: Key 'text': the mention as a string found by an NER model in the text. Key ‘start’: start character position of the mention in the text. Key ‘end’: end (one-past-last) character position of the mention in the text. Key 'tag': This key exists only if the mentioned entity is annotated as belonging to the Entities dataset - if so, the value is a dictionary identifying the Wikipedia page assigned by annotators to the mentioned entity: Key ‘pageid’: Wikipedia page id. Key ‘pagetitle’: page title. Key 'url': page URL.

    Backlinks dataset: The Backlinks dataset consists of two parts: dictionary Entity-to-Backlinks and Backlinks documents. The dictionary points to backlinks for each entity of the Entity dataset (if any backlinks exist for the entity). The Backlinks documents are the backlinks Wikipedia text chunks with identified mentions of the entities from the Entities dataset.

    Each mention is identified by surrounded double square brackets, e.g. "Muir built a small cabin along [[Yosemite Creek]].". However, if the mention differs from the exact entity name, the double square brackets wrap both the exact name and, separated by '|', the mention string to the right, for example: "Muir also spent time with photographer [[Carleton E. Watkins | Carleton Watkins]] and studied his photographs of Yosemite.".

    The Entity-to-Backlinks is a jsonl with 1527 items. File: Namesakes_backlinks_entities.jsonl Each item is a tuple: Entity name. Entity Wikipedia page id. Backlinks ids: a list of pageids of backlink documents.

    The Backlinks documents is a jsonl with 26903 items. File: Namesakes_backlinks_texts.jsonl Each item is a dictionary: Key ‘pageid’: Id of the Wikipedia page. Key ‘title’: Title of the Wikipedia page. Key 'content': Text chunk from the Wikipedia page, with all mentions in the double brackets; the text is cut 1000 characters after the last mention, the cut is denoted as '...[CUT]'. Key 'mentions': List of the mentions from the text, for convenience. Each mention is a tuple: Entity name. Entity Wikipedia page id. Sorted list of all character indexes at which the mention occurrences start in the text.

  3. Gender by Name (Time-series)

    • kaggle.com
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Gender by Name (Time-series) [Dataset]. https://www.kaggle.com/datasets/thedevastator/automated-gender-identification-using-name-proba/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Automated Gender Identification Using Name Probabilities

    2019 US Social Security Administration Data

    By Derek Howard [source]

    About this dataset

    This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.

    To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).

    In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
    Good luck!

    Research Ideas

    • Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.

    • Generate gender neutral names - use this data to generate random names with no gender bias.

    • Automate record lookup - quickly and accurately assign genders based on the probability associated with their name

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.

  4. E

    A corpus of names drawn from the local birth registers of England and Wales,...

    • dtechtive.com
    • find.data.gov.scot
    txt, xlsx, zip
    Updated Jan 25, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh (2018). A corpus of names drawn from the local birth registers of England and Wales, 1838-2014 [Dataset]. http://doi.org/10.7488/ds/2294
    Explore at:
    xlsx(30.21 MB), zip(5.395 MB), txt(0.0166 MB)Available download formats
    Dataset updated
    Jan 25, 2018
    Dataset provided by
    University of Edinburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    UNITED KINGDOM
    Description

    This dataset comprises a corpus of names, in both the first and middle position, for approximately 22 million individuals born in England and Wales between 1838 and 2014. This data is obtained from birth records made available by a set of volunteer-run genealogical resources - collectively, the 'UK local BMD project' (http://www.ukbmd.org.uk/local) - and has been re-purposed here to demonstrate the applicability of network analysis methods to an onomastic dataset. The ownership and licensing of the intellectual property constituting the original birth records is detailed at https://www.ukbmd.org.uk/TermsAndConditions. Under section 29A of the UK Copyright, Designs and Patents Act 1988, a copyright exception permits copies to be made of lawfully accessible material in order to conduct text and data mining for non-commercial research. The data included in this dataset represents the outcome of such a text-mining analysis. No birth records are included in this dataset, and nor is it possible for records to be reconstructed from the data presented herein. The data comprises an archive of tables, presenting this corpus in various forms: as a rank order of names (in both the first and middle position) by number of registered births per year, and by the total number of births across all years sampled. An overview of the data is also provided, with summary statistics such as the number of usable records registered per year, most popular names per year, and measures of forename diversity and the surname-to-forename usage ratio (an indicator of which forenames are more likely to be transferred uses of surnames). These tables are extensive but not exhaustive, and do not exclude the possibility that errors are present in the corpus. Data are also presented both as '.expression' files (an input format readable by the network analysis tool Graphia Professional) and as '.layout' files, a text file format output by Graphia Professional that describes the characteristics of the network so that it may be replicated. Characteristics of the original birth records that allow the identification of individuals - for instance, full name or location of birth - have been removed.

  5. a

    Facebook Names Dataset

    • academictorrents.com
    bittorrent
    Updated Nov 11, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ron Bowes (Skull Security) (2015). Facebook Names Dataset [Dataset]. https://academictorrents.com/details/e54c73099d291605e7579b90838c2cd86a8e9575
    Explore at:
    bittorrent(2991052604)Available download formats
    Dataset updated
    Nov 11, 2015
    Dataset authored and provided by
    Ron Bowes (Skull Security)
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    171 million names (100 million unique) This torrent contains: The URL of every searchable Facebook user s profile The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc) Processed lists, including first names with count, last names with count, potential usernames with count, etc The programs I used to generate everything So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-) Limitations So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don t have those capabilities right now. I d like to tackle that in the future, though, so if anybody has any bandwidth they d like to donate, all I need is an ssh account and Nmap installed. An additional limitation is that these are on

  6. Names Population Worldwide

    • kaggle.com
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarun R Jain (2023). Names Population Worldwide [Dataset]. https://www.kaggle.com/datasets/tarundalal/names-population-worldwide
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tarun R Jain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The "Person Name Population" dataset provides information on the incidence and frequency of different names within a population. It consists of three columns: Name, Incidence, and Frequency.

    The "Name" column represents the individual names of people, while the "Incidence" column denotes the number of individuals in the population who bear that particular name. The "Frequency" column indicates the relative occurrence or proportion of each name within the population.

    Use Case:

    This dataset can be valuable for various purposes, including:

    1. Sociodemographic Analysis: Researchers or analysts can utilize this dataset to study the distribution and prevalence of different names within a specific population. They can uncover patterns, trends, and cultural preferences related to naming practices.

    **2. Name Popularity Studies: **The dataset enables the exploration of popular or commonly used names. It allows researchers to identify the most prevalent names and track their frequency over time. This information can be useful for understanding naming trends and societal changes.

    3. Market Research: Companies and marketers can leverage this dataset to gain insights into consumer preferences and behaviors. They can analyze the popularity of names to inform targeted marketing strategies, such as personalized messaging or product customization.

    **4. Social Studies: **Sociologists and anthropologists can use the dataset to investigate the cultural significance of names within a specific population. They can explore naming conventions, naming traditions across different regions or ethnicities, and the impact of cultural factors on name selection.

    5. Historical Research: Historians may find this dataset valuable for studying name trends and patterns across different time periods. It can provide insights into naming practices in the past, allowing researchers to analyze societal changes, migration patterns, or cultural influences.

  7. d

    Race and ethnicity data for first, middle, and last names

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosenman, Evan; Olivella, Santiago; Imai, Kosuke (2023). Race and ethnicity data for first, middle, and last names [Dataset]. http://doi.org/10.7910/DVN/SGKW0K
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Rosenman, Evan; Olivella, Santiago; Imai, Kosuke
    Description

    We provide datasets that that estimate the racial distributions associated with first, middle, and last names in the United States. The datasets cover five racial categories: White, Black, Hispanic, Asian, and Other. The provided data are computed from the voter files of six Southern states -- Alabama, Florida, Georgia, Louisiana, North Carolina, and South Carolina -- that collect race and ethnicity data upon registration. We include seven voter files per state, sourced between 2018 and 2021 from L2, Inc. Together, these states have approximately 36MM individuals who provide self-reported race and ethnicity. The last name datasets includes 338K surnames, while the middle name dictionaries contains 126K middle names and the first name datasets includes 136K first names. For each type of name, we provide a dataset of P(race | name) probabilities and P(name | race) probabilities. We include only names that appear at least 25 times across the 42 (= 7 voter files * 6 states) voter files in our dataset. These data are closely related to the the dataset: "Name Dictionaries for "wru" R Package", https://doi.org/10.7910/DVN/7TRYAC. These are the probabilities used in the latest iteration of the "WRU" package (Khanna et al., 2022) to make probabilistic predictions about the race of individuals, given their names and geolocations.

  8. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.

  9. w

    Dataset of country full name and individuals using the Internet of countries...

    • workwithdata.com
    Updated Apr 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of country full name and individuals using the Internet of countries per year in San Marino and in 2021 (Historical) [Dataset]. https://www.workwithdata.com/datasets/countries-yearly?col=country%2Ccountry_long%2Cdate%2Cinternet_pct&f=2&fcol0=country&fcol1=date&fop0=%3D&fop1=%3D&fval0=San+Marino&fval1=2021
    Explore at:
    Dataset updated
    Apr 9, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    San Marino
    Description

    This dataset is about countries per year in San Marino. It has 1 row and is filtered where the date is 2021. It features 4 columns: country, country full name, and individuals using the Internet.

  10. LinkedIn Dataset - US People Profiles

    • kaggle.com
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph from Proxycurl (2023). LinkedIn Dataset - US People Profiles [Dataset]. https://www.kaggle.com/datasets/proxycurl/10000-us-people-profiles
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Joseph from Proxycurl
    Description

    Full profile of 10,000 people in the US - download here, data schema here, with more than 40 data points including - Full Name - Education - Location - Work Experience History and many more!

    There are additionally 258+ Million US people profiles available, visit the LinkDB product page here.

    Our LinkDB database is an exhaustive database of publicly accessible LinkedIn people and companies profiles. It contains close to 500 Million people and companies profiles globally.

  11. d

    Mental Health Services Monthly Statistics

    • digital.nhs.uk
    csv, pdf, xls, xlsx
    Updated Jul 21, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Mental Health Services Monthly Statistics [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-services-monthly-statistics
    Explore at:
    csv(13.0 kB), csv(272.1 kB), pdf(239.2 kB), pdf(729.1 kB), csv(387.3 kB), csv(375.0 kB), csv(1.3 MB), xlsx(118.7 kB), xls(1.1 MB), xls(994.8 kB), xls(389.6 kB), xls(138.2 kB), csv(5.3 kB)Available download formats
    Dataset updated
    Jul 21, 2016
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Mar 1, 2016 - May 31, 2016
    Area covered
    England
    Description

    This release presents experimental statistics from the Mental Health Services Data Set (MHSDS), using final submissions for April 2016 and provisional submissions for May 2016. This is the fifth monthly release from the dataset, which replaces the Mental Health and Learning Disabilities Dataset (MHLDDS). As well as analysis of waiting times, first published in March 2016, this release includes elements of the reports that were previously included in monthly reports produced from final MHLDDS submissions. In this publication a new data file has been produced to present the data for people identified as having learning disabilities and/or autistic spectrum disorder (LDA) characteristics. Because of the scope of the changes to the dataset (resulting in the name change to MHSDS and the new name for these monthly reports) it will take time to re-introduce all possible measures that were previously part of the MHLDS Monthly Reports. Additional measures will be added to this report in the coming months. Further details about these changes and the consultation that informed were announced in November. From January 2016 the release includes information on people in children and young people's mental health services, including CAMHS, for the first time. Learning disabilities and autism services have been included since September 2014. This release of final data for April 2016 comprises: - An Executive Summary, which presents national-level analysis across the whole dataset and also for some specific service areas and age groups - Data tables about access and waiting times in mental health services for the based on provisional data for the period 1 March 2016 to 31 May 2016. - A monthly data file which presents 92 measures for mental health, learning disability and autism services at National, Provider and Clinical Commissioning Group (CCG) level. - A Currency and Payments (CAP) data file, containing three measures relating to people assigned to Adult Mental Health Care Clusters. Further measures will be added in future releases. - A data file containing the measures relating to people with learning disabilities and/or autism. - Exploratory analysis of the coverage and completeness of access and waiting times statistics for people entering the Early Intervention in Psychosis pathway. - A set of provider level data quality measures for both months. The report comprises of validity measures for various data items at National and Provider level. From the publication of April data, a coverage report is included showing the number of providers submitting each month and number of records submitted. - A metadata file, which provide contextual information for each measure, including a full description, current uses, method used for analysis and some notes on usage. We will release the reports as experimental statistics until the characteristics of data flowed using the new data standard are understood. A correction has been made to this publication on 10 September 2018. This amendment relates to statistics in the monthly CSV data file; the specific measures effected are listed in the “Corrected Measures” CSV. All listed measures have now been corrected. NHS Digital apologises for any inconvenience caused.

  12. Historic US census - 1930

    • redivis.com
    application/jsonl +7
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Historic US census - 1930 [Dataset]. http://doi.org/10.57761/6e5q-rh85
    Explore at:
    application/jsonl, parquet, spss, csv, arrow, stata, avro, sasAvailable download formats
    Dataset updated
    Jan 10, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Time period covered
    Jan 1, 1930 - Dec 31, 1930
    Area covered
    United States
    Description

    Abstract

    The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

    Before Manuscript Submission

    All manuscripts (and other items you'd like to publish) must be submitted to

    phsdatacore@stanford.edu for approval prior to journal submission.

    We will check your cell sizes and citations.

    For more information about how to cite PHS and PHS datasets, please visit:

    https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

    Documentation

    This dataset was created on 2020-01-10 22:52:11.461 by merging multiple datasets together. The source datasets for this version were:

    IPUMS 1930 households: This dataset includes all households from the 1930 US census.

    IPUMS 1930 persons: This dataset includes all individuals from the 1930 US census.

    IPUMS 1930 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1930 datasets.

    Section 2

    Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

    In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

    The historic US 1930 census data was collected in April 1930. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

    Notes

    • We provide IPUMS household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.

    • Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.

    • Coded variables derived from string variables are still in progress. These variables include: occupation and industry.

    • Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGEMARR, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, FARM, EMPSTAT, OCC1950, IND1950, MTONGUE, MARST, RACE, SEX, RELATE, CLASSWKR. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.

    • Most inconsistent information was not edite

  13. w

    Dataset of country full name and individuals using the Internet of countries...

    • workwithdata.com
    Updated Apr 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of country full name and individuals using the Internet of countries per year in Solomon Islands (Historical) [Dataset]. https://www.workwithdata.com/datasets/countries-yearly?col=country%2Ccountry_long%2Cdate%2Cinternet_pct&f=1&fcol0=country&fop0=%3D&fval0=Solomon+Islands
    Explore at:
    Dataset updated
    Apr 9, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Solomon Islands
    Description

    This dataset is about countries per year in Solomon Islands. It has 64 rows. It features 4 columns: country, country full name, and individuals using the Internet.

  14. Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...

    • datarade.ai
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2024). Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat files available – 170M+, Verified Profiles - Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-us-premium-b2b-emails-phone-numbers-dataset-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    Area covered
    United States
    Description

    Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.

    Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.

    API Features:

    • Real-Time Updates: Our APIs deliver real-time updates, ensuring that the contact data your business relies on is always current and accurate.
    • High Volume Handling: Designed to support up to 860k API calls per day, our system is built for scalability and responsiveness, catering to enterprises of all sizes.
    • Flexible Integration: Easily integrate with CRM systems, marketing automation tools, and other enterprise applications to streamline your workflows and enhance productivity.

    Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.

    Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.

    Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.

    Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.

    Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.

    Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...

  15. d

    Census Data

    • catalog.data.gov
    • data.globalchange.gov
    • +2more
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Bureau of the Census (2024). Census Data [Dataset]. https://catalog.data.gov/dataset/census-data
    Explore at:
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    U.S. Bureau of the Census
    Description

    The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.

  16. Customer Dataset

    • kaggle.com
    Updated Nov 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S.Sharma (2022). Customer Dataset [Dataset]. https://www.kaggle.com/datasets/s26sharma/customer-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    S.Sharma
    Description

    Context: This dataset is about a company that sells different items. Company wants to know how the revenue is growing. Are there any particular items that is bringing more revenue to company.

    Columns: order_id: Order ID of the item order_date: Date when item was ordered item_id: Item ID sku: sku of item ordered qty_ordered: Quantities ordered price: Actual price value: Total Value discount_amount: Discount received on items total: Final Amount (Total Amount) category: Item Categories payment_method: Payment Method used bi_st:bi_st cust_id: Customer ID year:Years month: Months ref_num: Refrence number Name Prefix: Prefix First Name: Customer first name Middle Initial: Middle name initial Last Name: Customer last name Gender: Gender age: Age full_name: Customer full name E Mail: Customer email address Customer Since: Date customer joined Phone No. : Customer phone number Place Name: Place name County: County City: City State: State Zip: Zip Region: Region User Name: Customer user name Discount_Percent: Discount percentage

    I do not own this data. All credits to the original authors/creators. Used for educational purposes only

  17. H

    Geo-Refugee: A Refugee Location Dataset

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Mar 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kerstin C. Fisk (2017). Geo-Refugee: A Refugee Location Dataset [Dataset]. http://doi.org/10.7910/DVN/25952
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Kerstin C. Fisk
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2000 - 2010
    Area covered
    Africa
    Description

    The refugee location data (Geo-Refugee) provides information on the geographical locations, population sizes and accommodation types of refugees and people in refugee-like situations throughout Africa. Based on the United Nations High Commissioner for Refugees' Location and Demographic Composition data as well as information contained in supplemental UNHCR resources, Geo-Refugee assigns administrative unit names and geographic coordinates to refugee camps/ centers, and locations hosting dispersed (self-settled) refugees. Geo-Refugee was collected for the purpose of investigating the relationship between refugees and armed conflict, but can be used for a number of refugee-related studies. The original data for the category refugees and people in a refugee-like situation by accommodation type and location name comes directly from the UNHCR. The category refugees includes: "individuals recognized under the 1951 Convention relating to the Status of Refugees and its 1967 Protocol; the 1969 OAU Convention Governing the Specific Aspects of Refugee Problems in Africa; those recognized in accordance with the UNHCR statute; individuals granted complementary forms of protection and those enjoying temporary protection.The category people in a refugee-like situation "is descriptive in nature and includes groups of people who are outside their country of origin and who face protection risks similar to those of refugees, but for whom refugee status has, for practical or other reasons, not been ascertained" (UNHCR http://www.unhcr.org/45c06c662.html). The unit of the data is the first-level administrative unit (province, region or state). A refugee location is defined as a unit with a known refugee population, as established by UNHCR country offices. The locations data was compiled using statistics provided by the UNHCR Division of Programme Support and Management. Several of the refugee sites in the original UNHCR data are camp names or other lo cations which are not immediately traceable to a particular location using even the most established geographical databases like that of the National Geospatial Intelligence Agency (NGA). Thus, unit-level location of refugees was established and confirmed using supplementary resources including reports, maps, and policy documents compiled by the UNHCR and contained in the Refworld database (see http://www.unhcr.org/cgi-bin/texis/vtx/refworld/rwmain). Refworld was the primary database used for this project. Geographic coordinates were assigned using the database of the National Geospatial-Intelligence Agency. See https://www1.nga.mil/Pages/default.aspx for more information. All attempts were made to find precise coordinates, including cross-referencing with Google Maps. The current version of the data covers 43 African countries and encompasses the period 2000 to 2010. The UNHCR began systematically collecting information on the locations and demographic compositions of refugee populations in 2000.

  18. Z

    Worldwide Soundscapes project metadata and analysis scripts

    • data.niaid.nih.gov
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amandine Gasc (2025). Worldwide Soundscapes project metadata and analysis scripts [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6486835
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    Rodney Rountree
    Li, Songhai
    Youfang Chen
    Dong, Lijun
    Amandine Gasc
    Thomas Cherico Wanger
    Kevin F.A. Darras
    Steven Van Wilgenburg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated passive acoustic monitoring meta-datasets (i.e. meta-data collections). This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description. Additionally, R scripts are provided to replicate the analysis published in [placeholder].

    The overview of all sampling sites and timelines can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. The recordings of this collection were annotated and analysed to explore macro-ecological trends.

    The audio recording criteria justifying inclusion into the meta-database are:

    Stationary (no transects, towed sensors or microphones mounted on cars)

    Passive (unattended, no human disturbance by the recordist)

    Ambient (no directional microphone or triggered recordings, non-experimental conditions)

    Spatially and/or temporally replicated (i.e. multiple sites sampled at the same time and/or multiple days - covering the same daytime - sampled at the same site)

    The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database. The data shared here only includes validated collections.

    Changes from version 3.0.1

    Added files needed to reproduce the metadata and the acoustic analyses found in the publication.

    Dropped underused fields: spatial_selection, temporal_exclusion, freshwater_recordist_position from collections table; secondary realm, biome, and functional group from sites table.

    Meta-database CSV files

    collections

    collection_id: unique integer, primary key

    name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.

    ecoSound-web_link: link of validated meta-collection on ecoSound-web

    primary_contributors: full names of people deemed corresponding contributors who are responsible for the dataset

    secondary_contributors: full names of people who are not primary contributors but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses

    date_added: when the datased was added (YYYY-MM-DD)

    URL_open_recordings: internet link for openly-available recordings from this collection

    URL_project: internet link for further information about the corresponding project

    DOI_publication: Digital Object Identifiers of corresponding publications

    core_realm_IUCN: The main, core realm of the dataset according to IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/

    medium: the physical medium the microphone is situated in

    locality: optional free text about the locality

    contributor_comments: free-text field for comments by the primary contributors

    collections-sites

    dataset_ID: primary key of collections table

    site_ID: primary key of sites table

    sites

    site_ID: unique integer, primary key

    site_name: internal name or code of sampling site as used in respective projects

    latitude_numeric: site's numeric degrees of latitude

    longitude_numeric: site's numeric degrees of longitude

    blurred_coordinates: whether latitude and longitude coordinates are inaccurate, boolean. Coordinates may be blurred with random offsets, rounding, snapping, etc. Indicate the blurring method inside the comments field

    topography_m: vertical position of the microphone relative to the sea level. for sites on land: elevation. For marine sites: depth (negative). in meters. Only indicate if the values were measured by the collaborator.

    freshwater_depth_m: microphone depth, only used for sites inside freshwater bodies that also have an elevation value above the sea level

    realm: Ecosystem type: main realm according to IUCN GET https://global-ecosystems.org/

    biome: Ecosystem type: main biome according to IUCN GET https://global-ecosystems.org/

    functional_group: Ecosystem type: main functional group according to IUCN GET https://global-ecosystems.org/

    contributor_comments: free text field for contributor comments

    GADM_0: Global ADMinistrative Database level 0 classification of terrestrial site or marine site that is within territorial waters. Source: https://gadm.org/download_world.html

    IHO: International Hydrographic Organization classification of marine site. Source: https://marineregions.org/downloads.php

    WDPA: World Database on Protected Areas classification of the site. Source: https://www.protectedplanet.net/en/thematic-areas/wdpa?tab=WDPA

    deployments

    dataset_ID: primary key of datasets table

    deployment: identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.

    subset_site_ID: If the deployment was not done in all the sites of the corresponding collection, site IDs where the deployment was conducted

    start_date: date of deployment start

    start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

    permanent: whether the deployment is permanent, boolean

    end_date: date of deployment end (date when last scheduled operation starts)

    end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.

    operation_mode: continuous: recording takes place from the deployment start date-time to deployment end date-time.periodical: recording takes place periodically (i.e., with duty cycle) from the deployment start date-time to deployment end date-time.scheduled: recording takes place during scheduled daily time intervals (optionally with duty cycle)

    duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". empty if no duty cycle is used. For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes

    operation_start_time_mixed: only for scheduled recordings: start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

    operation_duration_minutes: only for scheduled recordings: duration of operation in minutes, if constant

    operation_end_time_mixed: only for scheduled recordings: end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Only required if durations are variable. Do not use when end times are ambiguous (for instance, if a recording could be 1 hour or 25 hours long because the end is on the next day). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

    high_pass_filter_Hz: frequency of the high-pass filter of the recorder if applied, in Hz. Otherwise, write "none". This may be called a "low-cut" filter too.

    bit_depth: sampling bit depth of the recordings. Often constant for a particular recorder

    channels: number of recorded audio channels

    sampling_frequency_kHz: frequency at which the microphone signal was sampled by the recorder (sounds of half that frequency will be recorded)

    recorder: recorder used for deployment

    microphone: microphone used for deployment

    target_taxa: main IUCN animal taxa that were studied with this deployment, using the exact IUCN Red list names (http://www.iucnredlist.org/), separated by commas. Only genera, families, orders, and classes are accepted. Empty if there was no taxonomic focus (i.e., general soundscapes were the study focus).

    contributor_comments: free text field for contributor comments

    exact_recordings: whether the deployment data here have been superseded by inserting more exact recording date-time ranges into the meta-collection on ecoSound-web

    recordings (partial download from ecoSound-web)

    recording_id: primary key of the recordings table

    collection_id: ID of the collection the recording belongs to

    name: name of the recording

    site_id: site ID the recording belongs to:

    recorder_id: ID of the recorder used for the recording (internal ecoSound-web code)

    microphone_id: ID of the microphone used for the recording (internal ecoSound-web code)

    recording_gain:recording gain applied for amplifying the audio signal, in decibels

    duty_cycle_recording: fraction of the recording periode when the recorder is actively recording audio

    duty_cycle_period: period of the duty cycle, i.e., time between the starts of two subsequent recordings

    note: comments (contains the target taxon)

    file_date: date of the recording start

    file_time: local time of the recording start

    sampling_rate: audio sampling rate in Hz

    bitdepth: depth in bits for each audio sample

    channel_num: number of channels

    duration: duration of the recording in seconds. Note: duty-cycled recordings cover only a proportion of this duration

    affiliations

    affiliation_id: primary key of affiliations table

    lab_research_group: Laboratory or research group name

    department_school_institute: department, school, or institute name

    university_institution: University or institution name

    street_address: street address

    region_state_province_city: region, state, province, or city name

    postal_code: postal code

    country: country

  19. Popular Baby Names - Dataset - data.sa.gov.au

    • data.sa.gov.au
    Updated Mar 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sa.gov.au (2025). Popular Baby Names - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/popular-baby-names
    Explore at:
    Dataset updated
    Mar 1, 2025
    Dataset provided by
    Government of South Australiahttp://sa.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Australia
    Description

    List of male and female baby names in South Australia from 1944 to 2024. The annual data for baby names is published January/February each year.

  20. US Gross Rent ACS Statistics

    • kaggle.com
    Updated Aug 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Golden Oak Research Group (2017). US Gross Rent ACS Statistics [Dataset]. https://www.kaggle.com/datasets/goldenoakresearch/acs-gross-rent-us-statistics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Golden Oak Research Group
    Area covered
    United States
    Description

    What you get:

    Upvote! The database contains +40,000 records on US Gross Rent & Geo Locations. The field description of the database is documented in the attached pdf file. To access, all 325,272 records on a scale roughly equivalent to a neighborhood (census tract) see link below and make sure to upvote. Upvote right now, please. Enjoy!

    Get the full free database with coupon code: FreeDatabase, See directions at the bottom of the description... And make sure to upvote :) coupon ends at 2:00 pm 8-23-2017

    Gross Rent & Geographic Statistics:

    • Mean Gross Rent (double)
    • Median Gross Rent (double)
    • Standard Deviation of Gross Rent (double)
    • Number of Samples (double)
    • Square area of land at location (double)
    • Square area of water at location (double)

    Geographic Location:

    • Longitude (double)
    • Latitude (double)
    • State Name (character)
    • State abbreviated (character)
    • State_Code (character)
    • County Name (character)
    • City Name (character)
    • Name of city, town, village or CPD (character)
    • Primary, Defines if the location is a track and block group.
    • Zip Code (character)
    • Area Code (character)

    Abstract

    The data set originally developed for real estate and business investment research. Income is a vital element when determining both quality and socioeconomic features of a given geographic location. The following data was derived from over +36,000 files and covers 348,893 location records.

    License

    Only proper citing is required please see the documentation for details. Have Fun!!!

    Golden Oak Research Group, LLC. “U.S. Income Database Kaggle”. Publication: 5, August 2017. Accessed, day, month year.

    For any questions, you may reach us at research_development@goldenoakresearch.com. For immediate assistance, you may reach me on at 585-626-2965

    please note: it is my personal number and email is preferred

    Check our data's accuracy: Census Fact Checker

    Access all 325,272 location for Free Database Coupon Code:

    Don't settle. Go big and win big. Optimize your potential**. Access all gross rent records and more on a scale roughly equivalent to a neighborhood, see link below:

    A small startup with big dreams, giving the every day, up and coming data scientist professional grade data at affordable prices It's what we do.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Organization logo

USA Name Data

USA Name Data (BigQuery Dataset)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered
United States
Description

Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?

Search
Clear search
Close search
Google apps
Main menu