40 datasets found

USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
a
Facebook Names Dataset
academictorrents.com
bittorrent
Updated Nov 11, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ron Bowes (Skull Security) (2015). Facebook Names Dataset [Dataset]. https://academictorrents.com/details/e54c73099d291605e7579b90838c2cd86a8e9575
Explore at:
bittorrent(2991052604)Available download formats
Dataset updated
Nov 11, 2015
Dataset authored and provided by
Ron Bowes (Skull Security)
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
171 million names (100 million unique) This torrent contains: The URL of every searchable Facebook user s profile The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc) Processed lists, including first names with count, last names with count, potential usernames with count, etc The programs I used to generate everything So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-) Limitations So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don t have those capabilities right now. I d like to tackle that in the future, though, so if anybody has any bandwidth they d like to donate, all I need is an ssh account and Nmap installed. An additional limitation is that these are on
Baby Names from Social Security Card Applications - National Data
catalog.data.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Explore at:
Dataset updated
Jul 4, 2025
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.
e
First name file since 1900
data.europa.eu
gimi9.com
csv dbase
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut national de la statistique et des études économiques (Insee), First name file since 1900 [Dataset]. https://data.europa.eu/data/datasets/5bf42c958b4c4144b0110ce8?locale=en
Explore at:
csv dbaseAvailable download formats
Dataset authored and provided by
Institut national de la statistique et des études économiques (Insee)
License
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Description
The first names file contains data on the first names attributed to children born in France since 1900. These data are available at the level of France and by department.

The files available for download list births and not living people in a given year. They are available in two formats (DBASE and CSV). To use these large files, it is recommended to use a database manager or statistical software. The file at the national level can be opened from some spreadsheets. The file at the departmental level is however too large (3.8 million lines) to be consulted with a spreadsheet, so it is proposed in a lighter version with births since 2000 only.

The data can be accessed in: - a national data file containing the first names attributed to children born in France between 1900 and 2022 (data before 2012 relate only to France outside Mayotte) and the numbers by sex associated with each first name; - a departmental data file containing the same information at the department of birth level; - a lighter data file that contains information at the department level of birth since the year 2000.
d
Popular Baby Names
catalog.data.gov
data.cityofnewyork.us
+4more
Updated Jul 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
data.cityofnewyork.us
Description
Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.
G
First name bank - Girls
open.canada.ca
data.urbandatacentre.ca
csv, html
Updated Sep 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government and Municipalities of Québec (2025). First name bank - Girls [Dataset]. https://open.canada.ca/data/en/dataset/13db2583-427a-4e5f-b679-8532d3df571f
Explore at:
html, csvAvailable download formats
Dataset updated
Sep 17, 2025
Dataset provided by
Government and Municipalities of Québec
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Jan 1, 1980 - Dec 31, 2024
Description
List of female names for children eligible for family benefits since 1980, including newborn babies living in Quebec or having immigrated to Quebec. The bank is updated once a year in spring with data from the previous year. First names are listed in alphabetical order. To ensure the confidentiality of personal information, first names given to fewer than five children are marked “<5". In addition, first names assigned only once over the entire period 1980 to 2024 were removed from the list.
Baby names for boys in England and Wales
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Baby names for boys in England and Wales [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalesbabynamesstatisticsboys
Explore at:
xlsxAvailable download formats
Dataset updated
Jul 31, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Rank and count of the top names for baby boys, changes in rank since the previous year and breakdown by country, region, mother's age and month of birth.
Baby names for girls in England and Wales
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Baby names for girls in England and Wales [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalesbabynamesstatisticsgirls
Explore at:
xlsxAvailable download formats
Dataset updated
Jul 31, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Rank and count of the top names for baby girls, changes in rank since the previous year and breakdown by country, region, mother's age and month of birth.
Airline Dataset
kaggle.com
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Airline Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/airline-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 26, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Airline data holds immense importance as it offers insights into the functioning and efficiency of the aviation industry. It provides valuable information about flight routes, schedules, passenger demographics, and preferences, which airlines can leverage to optimize their operations and enhance customer experiences. By analyzing data on delays, cancellations, and on-time performance, airlines can identify trends and implement strategies to improve punctuality and mitigate disruptions. Moreover, regulatory bodies and policymakers rely on this data to ensure safety standards, enforce regulations, and make informed decisions regarding aviation policies. Researchers and analysts use airline data to study market trends, assess environmental impacts, and develop strategies for sustainable growth within the industry. In essence, airline data serves as a foundation for informed decision-making, operational efficiency, and the overall advancement of the aviation sector.

Content

This dataset comprises diverse parameters relating to airline operations on a global scale. The dataset prominently incorporates fields such as Passenger ID, First Name, Last Name, Gender, Age, Nationality, Airport Name, Airport Country Code, Country Name, Airport Continent, Continents, Departure Date, Arrival Airport, Pilot Name, and Flight Status. These columns collectively provide comprehensive insights into passenger demographics, travel details, flight routes, crew information, and flight statuses. Researchers and industry experts can leverage this dataset to analyze trends in passenger behavior, optimize travel experiences, evaluate pilot performance, and enhance overall flight operations.

Dataset Glossary (Column-wise)

Passenger ID - Unique identifier for each passenger

First Name - First name of the passenger

Last Name - Last name of the passenger

Gender - Gender of the passenger

Age - Age of the passenger

Nationality - Nationality of the passenger

Airport Name - Name of the airport where the passenger boarded

Airport Country Code - Country code of the airport's location

Country Name - Name of the country the airport is located in

Airport Continent - Continent where the airport is situated

Continents - Continents involved in the flight route

Departure Date - Date when the flight departed

Arrival Airport - Destination airport of the flight

Pilot Name - Name of the pilot operating the flight

Flight Status - Current status of the flight (e.g., on-time, delayed, canceled)

Structure of the Dataset

https://i.imgur.com/cUFuMeU.png" alt="">

Acknowledgement

The dataset provided here is a simulated example and was generated using the online platform found at Mockaroo. This web-based tool offers a service that enables the creation of customizable Synthetic datasets that closely resemble real data. It is primarily intended for use by developers, testers, and data experts who require sample data for a range of uses, including testing databases, filling applications with demonstration data, and crafting lifelike illustrations for presentations and tutorials. To explore further details, you can visit their website.

Cover Photo by: Kevin Woblick on Unsplash

Thumbnail by: Airplane icons created by Freepik - Flaticon
d
Census Data
catalog.data.gov
data.globalchange.gov
+2more
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Bureau of the Census (2024). Census Data [Dataset]. https://catalog.data.gov/dataset/census-data
Explore at:
Dataset updated
Mar 1, 2024
Dataset provided by
U.S. Bureau of the Census
Description
The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.
LinkedIn Dataset - US People Profiles
kaggle.com
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph from Proxycurl (2023). LinkedIn Dataset - US People Profiles [Dataset]. https://www.kaggle.com/datasets/proxycurl/10000-us-people-profiles/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Joseph from Proxycurl
Description
Full profile of 10,000 people in the US - download here, data schema here, with more than 40 data points including - Full Name - Education - Location - Work Experience History and many more!

There are additionally 258+ Million US people profiles available, visit the LinkDB product page here.

Our LinkDB database is an exhaustive database of publicly accessible LinkedIn people and companies profiles. It contains close to 500 Million people and companies profiles globally.
Best Books Ever Dataset
zenodo.org
csv
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4265096
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness | | ------------- | ------------- | ------------- | | bookId | Book Identifier as in goodreads.com | 100 | | title | Book title | 100 | | series | Series Name | 45 | | author | Book's Author | 100 | | rating | Global goodreads rating | 100 | | description | Book's description | 97 | | language | Book's language | 93 | | isbn | Book's ISBN | 92 | | genres | Book's genres | 91 | | characters | Main characters | 26 | | bookFormat | Type of binding | 97 | | edition | Type of edition (ex. Anniversary Edition) | 9 | | pages | Number of pages | 96 | | publisher | Editorial | 93 | | publishDate | publication date | 98 | | firstPublishDate | Publication date of first edition | 59 | | awards | List of awards | 20 | | numRatings | Number of total ratings | 100 | | ratingsByStars | Number of ratings by stars | 97 | | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 | | setting | Story setting | 22 | | coverImg | URL to cover image | 99 | | bbeScore | Score in Best Books Ever list | 100 | | bbeVotes | Number of votes in Best Books Ever list | 100 | | price | Book's price (extracted from Iberlibro) | 73 |
f
Data from: Feed the Future Grain Legumes Project Database
datasetcatalog.nlm.nih.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+3more
Updated Feb 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Glahn, Raymond; Miklas, Phillip N.; Estevez, Consuelo; Pastor-Corrales, Marcial A.; Wisler, Gail; Grusak, Michael A.; Scott, Roy; Cichy, Karen A.; Porch, Timothy G.; Beaver, James (2024). Feed the Future Grain Legumes Project Database [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001422346
Explore at:
Dataset updated
Feb 8, 2024
Authors
Glahn, Raymond; Miklas, Phillip N.; Estevez, Consuelo; Pastor-Corrales, Marcial A.; Wisler, Gail; Grusak, Michael A.; Scott, Roy; Cichy, Karen A.; Porch, Timothy G.; Beaver, James
Description
Data from this project focuses on the evaluation of breeding lines. Significant progress was made in advancing breeding populations directed towards release of improved varieties in Tanzania. Thirty promising F4:7, 1st generation 2014 PIC (Phaseolus Improvement Cooperative) and ~100 F4:6, 2nd generation 2015 PIC breeding lines were selected. In addition, ~300 F4:5, 3rd generation 2016 PIC single plant selections were completed in Arusha and Mbeya. These breeding lines, derived from 109 PIC populations specifically developed to combine abiotic and biotic stress tolerance, showed superior agronomic potential compared with checks and local landraces. The diversity, scale, and potential of the material in the PIC breeding pipeline is invaluable and requires continued support to ensure the release of varieties that promise to increase the productivity of common bean in the E. African region. Data available includes databases, spreadsheets, and images related to the project. Resources in this dataset:Resource Title: Data Dictionary. File Name: ADP-1_DD.pdfResource Title: ADP-1 Database. File Name: ADP1-DB.zipResource Description: This file is a link to a draft version of the development and characterization of the common bean diversity panel (ADP) database in Microsoft Access. Preliminary information is provided in this database, while the full version is being prepared. In order to use the database you’ll need to download the complete file, extract it and open the MS access file. You must allow active content when opening the database for it to work properly. Downloaded on November 17, 2017.Resource Title: Anthracnose Screening of Andean Diversity Panel (ADP) . File Name: Anthracnose-screening-of-ADP.pdfResource Description: Approximately 230 ADP lines of the ADP were screened with 8 races of anthracnose under controlled conditions at Michigan State University. Dr. James Kelly has provided this valuable dataset for sharing in light of the Open Data policy of the US government. This dataset represents the first comprehensive screening of the ADP with a broad set of races of a specific pathogen.Resource Title: ARS - Feed the Future Shared Data . File Name: ARS-FtF-Data-Sharing.zipResource Description: The data provided herein is an early draft version of the data that has been generated by the ARS Feed-the-Future Grain Legumes Project that is focused on common bean research. Resource Title: PIC (Phaseolus Improvement Cooperative) Populations . File Name: PIC-breeding-populations.xlsxResource Description: The complete list of PIC breeding populations (Excel Format) PIC (Phaseolus Improvement Cooperative) populations are bulked populations for improvement of common bean in Feed the Future Countries, with a principal focus on sub-Saharan Africa. These populations are for distribution to collaborators, are segregating for key biotic and abiotic stress constraints, and can be used for selection and release of improved cultivars/germplasm. Many of these populations are derived from crosses between ADP landrances and cultivars from sub-Saharan Africa and other improved genotypes with key biotic or abiotic stress tolerance. Phenotypic and genotypic information related to the parents of the crosses can be found in the ADP Database.
f
REM_Turku
datasetcatalog.nlm.nih.gov
figshare.com
Updated Jun 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sikka, Pilleriin; Revonsuo, Antti; Valli, Katja; Noreika, Valdas (2023). REM_Turku [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000938190
Explore at:
Dataset updated
Jun 7, 2023
Authors
Sikka, Pilleriin; Revonsuo, Antti; Valli, Katja; Noreika, Valdas
Area covered
Turku
Description
Dream EEG and Mentation (DREAM) data set ---Data set information--- Common name: REM_Turku Full name: REM_Turku Authors: Pilleriin Sikka, Antti Revonsuo, Valdas Noreika, and Katja Valli Location: Turku, Finland Year: N/A Set ID: [SET BY DATABASE] Amendment: [SET BY DATABASE] Corresponding author ID: [SET BY DATABASE] Download URL: [SET BY DATABASE] Correspondence: Pilleriin Sikka (pilsik@utu.fi), Katja Valli (katval@utu.fi) ---Metadata--- Key ID: [SET BY DATABASE] Date entered: [SET BY DATABASE] Number of samples: [INFERRED BY DATABASE] Number of subjects: [INFERRED BY DATABASE] Proportion REM: [INFERRED BY DATABASE] Proportion N1: [INFERRED BY DATABASE] Proportion N2: [INFERRED BY DATABASE] Proportion experience: [INFERRED BY DATABASE] Proportion no-experience: [INFERRED BY DATABASE] Proportion healthy: [INFERRED BY DATABASE] Provoked awakening: Yes Time of awakening: Night Form of response: Free Date approved: [SET BY DATABASE] ---How to decode data files--- The PSG files are raw (i.e., not preprocessed) EEG data files and include the last 2 minutes of preawakening EEG from each REM episode obtained using a serial awakening paradigm in the sleep laboratory. Data set includes EEG data from 134 awakenings of 18 participants. Files are organized according to the format: /Data/PSG/casexx_syy, where xx refers to Case ID yy refers to Subject ID In the "Records.csv" file, the following information is presented: Filename: in the format /Data/PSG/Sxx_yy Case ID: number of awakening for the subject Subject ID: unique identifier of subject Experience: 2 = dream experience; 1 = without recall ("white dream"); 0 = no dream experience Treatment group: N/A Duration: duration of the EEG data file in seconds EEG sample rate: the sampling rate of the EEG in Hertz Number of EEG channels: the number of EEG signals in this sample Last sleep stage: the scored sleep stage of the final epoch in the sample Has EOG: whether EOG is included in the sample (1 = yes; 0 = no) Has EMG: whether EMG is included in the sample (1 = yes; 0 = no) Has ECG: whether ECG is included in the sample (1 = yes; 0 = no) Proportion artifacts: N/A Time of awakening: time when this sample’s PSG ends Subject age: age of the subject Subject sex: sex of the subject (key: 0 = male, 1 = female; 2 = other) Subject healthy: whether subject is from a relatively healthy population (key: 0 = no; 1 = yes) Has more data: whether this sample has more data in the form of files under the /Data directory other than the /Data/PSG directory (key: 0=no, 1=yes) Remarks: The first number refers to the number of experimental night in the sleep lab for the subject, the second number refer to the number of awakening during that night (e.g., 1_3 refers to Night 1, Awakening 3). Remarks can also include other important information regarding this subject or data file. In the "Ratings.csv" file, the following information is presented: Filename: in the format /Data/PSG/casexx_syy DreamReport_Wordcount: total number of dream-related words minus utterances, fillers, repetitions, corrections, waking commentaries ER = external ratings of emotions expressed in dream reports, conducted by two blind judges using the Finnish version of the modified Differential Emotions Scale (mDES; Fredrickson, 2013); the number refers to the frequency of occurrence of the emotion item in the dream report SR = self-ratings of emotions experienced in the preceding dream using the mDES, conducted by participants themselves upon awakening and after having reported the dream; each item was rated on the scale from 0 = not at all to 4 = extremely much PA = positive emotion/affect item of mDES NA = negative emotion/affect item of mDES The following are the 10 positive and 10 negative emotion/affect items of the mDES scale: PA1 - Amused_Funloving_Giggly PA2 - Awe_Wonder_Amazement PA3 - Grateful_Appreciative_Thankful PA4 - Hopeful_Optimistic_Encouraged PA5 - Inspired_Uplifted_Elevated PA6 - Interested_Alert_Curious PA7 - Joyful_Glad_Happy PA8 - Love_Closeness_Trust PA9 - Proud_Confident_Selfassured PA10 - Serene_Content_Peaceful NA1 - Angry_Irritated_Annoyed NA2 - Ashamed_Humiliated_Disgraced NA3 - Contemptuous_Scornful_DIsdainful NA4 - Disgust_Distaste_Revulsion NA5 - Embarrassed_Selfconscious_Blushing NA6 - Guilty_Repetant_Blameworthy NA7 - Hate_Distrust_Suspicion NA8 - Sad_Downhearted_Unhappy NA9 - Scared_Fearful_Afraid NA10 - Stressed_Nervous_Overwhelmed ER_InferredExpressed = whether the emotion was directly expressed in the dream report or could be inferred from the behaviour of the dream self (key: 1 = expressed, 2 = inferred, 3 = both) Remarks: any remarks regarding this data file --Treatment group codes-- N/A ---Experimental description--- A full description of the materials and methods can be found in the following article: Sikka, P., Revonsuo, A., Noreika, V., & Valli, K. (2019). EEG frontal alpha asymmetry and dream affect: Alpha oscillations over the right frontal cortex during REM sleep and presleep wakefulness predict anger in REM sleep dreams. Journal of Neuroscience, 39(24): 4775-4784. https://doi.org/10.1523/JNEUROSCI.2884-18.2019 Participants: Healthy, not using medication, right-handed, native Finnish speakers, with good sleep quality (score ≤ 5 on the Pittsburgh Sleep Quality Index; Buysse et al., 1989). Experimental design and procedure: For a Figure displaying the experimental procedure, see Sikka et al. (2019, p. 4777). Participants spent 2 nights (separated by a week) in the sleep laboratory. In the evening, participants arrived in the laboratory 2h before their usual bedtime. First, participants were instructed about the procedure of the study and EEG electrodes attached to their scalp. Next, participants' waking state resting EEG was recorded (8 x 1 min; 10:30pm-12:00am) and they rated their current waking affective state using the Finnish version of the modified Differential Emotions Scale (fmDES, Fredickrson, 2013). Participants were then allowed to fall asleep. Sleep stages were monitored and scored visually (Rechtschaffen and Kales, 1968; Iber et al., 2007). Every time REM sleep had lasted continuously for 5 min, and was in a phasic stage, a tone signal was used to awaken the participants. Upon awakening, participants provided an oral dream report: first, they reported the last image they had in mind just before awakening, followed by a detailed report of the whole dream. Next, participants rated their affective experiences in the preceding dream by filling in the fmDES electronically using a mouse and a computer screen above the bed. In case the participants reported ‘‘no dreams’’, researchers asked whether they had not had a dream or they felt like they had had a dream but could not recall any specific content (i.e., ‘white dream’). In these two cases fmDES was not filled in. Participants were then allowed to continue their sleep. This procedure was repeated throughout the night until the final morning awakening (scheduled between 5:30 A.M. to 8:30 A.M.). Upon final awakening, and after having reported and rated the last dream, participants were asked to lie in bed but stay awake. Similar to the evening, waking state resting (morning baseline) EEG was then recorded for 8 min, followed by participants’ ratings of their current waking state affect using the fmDES. --DREAM categorization procedure-- Dream experiences = participants remembered having a dream and were able to report at least some of its content. Without recall ("white dream" = participants felt like they had had a dream but could not recall any specific content. No recall = participants reported not having had any dream experiences. ---Technical details--- N/A --Data acquisition-- EEG was recorded using 24 single Ag/AgCl electrodes (placed according to standard 10/10 system). Additionally, 4 EOG electrodes were used to record eye movements and an EMG electrode (placed on the chin) was used to record muscle activity. All electrodes (except the bipolar EOG and EMG electrodes) were references to the right mastoid. The ground electrode was placed on the forehead. EEG signal was amplified (SynAmps model 5083), notch-filtered at 50 Hz, digitized at 500 Hz, and recorded with Neuroscan equipment and software. All impedances were kept <5 kΩ. --Data preprocessing-- Data has not been preprocessed.
e
Inspire Download Service (predefined ATOM) for data set Natural Spatial...
data.europa.eu
atom feed
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministerium für Umwelt, Klima, Mobilität, Agrar und Verbraucherschutz, Inspire Download Service (predefined ATOM) for data set Natural Spatial Structure Saarland (and adjacent areas) (2011) [Dataset]. https://data.europa.eu/data/datasets/48ea0d18-40dd-4494-af4b-b7c894140e3c?locale=en
Explore at:
atom feedAvailable download formats
Dataset authored and provided by
Ministerium für Umwelt, Klima, Mobilität, Agrar und Verbraucherschutz
Description
Description of the INSPIRE Download Service (predefined Atom): Saarland (and adjacent areas) (2011) Hierarchically structured natural spatial structure (natural spaces first to fourth order). Shown is the TK grid rectangle 6304-6810 with the Saarland as the center. Attributes: IDENTIFICATION: ID of the surface element in the original database; NATNR1_SLL: Number of first-order natural space; NATNAM1: Name of the first order natural space; NATNR2_SLL: Number of second-order natural space; NATNAM2: Name of the second-order natural space; NATNR3_SLL: Number of third-order natural space; NATNR3_MS: The number of the large unit; NATNAM3: Name of third order natural space; NATNR4_SLL: Number of the fourth order natural space; NATNR4_MS: The number of the subunit; NATNAM4: Name of the fourth order natural space; NRBEM: Remark; NRZINFO: Natural space More information; Viewing object in the GDZ; the MultiFeature class (composed of area feature class GDZ2010.A_ngnraum and the business table with the property data (GDZ2010.ngnraum)) has been exported to the filegeodatabase; The following user-relevant attributes are available: ID NATNR1: Natural space number 1.Order NATNAM1: Natural space name 1. Order NATNR2: Natural space number 2.Order NATNAM2: Natural space name 2. Order NATNR3: 3.Order NATNAM3: Natural space name 3. Order NATNR4: 4.Order NATNAM4: Natural space name 4. Order NRZINFO: Natural area More information NRBEM: Note — The link(s) for downloading the records is/are generated dynamically from getFeature Requests to a WFS 1.1.0
Price Paid Data
gov.uk
Updated Sep 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HM Land Registry (2025). Price Paid Data [Dataset]. https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
Explore at:
Dataset updated
Sep 29, 2025
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
HM Land Registry
Description
Our Price Paid Data includes information on all property sales in England and Wales that are sold for value and are lodged with us for registration.

Get up to date with the permitted use of our Price Paid Data:
check what to consider when using or publishing our Price Paid Data

Using or publishing our Price Paid Data

If you use or publish our Price Paid Data, you must add the following attribution statement:

Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.

Price Paid Data is released under the http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/">Open Government Licence (OGL). You need to make sure you understand the terms of the OGL before using the data.

Under the OGL, HM Land Registry permits you to use the Price Paid Data for commercial or non-commercial purposes. However, OGL does not cover the use of third party rights, which we are not authorised to license.

Price Paid Data contains address data processed against Ordnance Survey’s AddressBase Premium product, which incorporates Royal Mail’s PAF® database (Address Data). Royal Mail and Ordnance Survey permit your use of Address Data in the Price Paid Data:

for personal and/or non-commercial use

to display for the purpose of providing residential property price information services

If you want to use the Address Data in any other way, you must contact Royal Mail. Email address.management@royalmail.com.

Address data

The following fields comprise the address data included in Price Paid Data:

Postcode

PAON Primary Addressable Object Name (typically the house number or name)

SAON Secondary Addressable Object Name – if there is a sub-building, for example, the building is divided into flats, there will be a SAON

Street

Locality

Town/City

District

County

August 2025 data (current month)

The August 2025 release includes:

the first release of data for August 2025 (transactions received from the first to the last day of the month)

updates to earlier data releases

Standard Price Paid Data (SPPD) and Additional Price Paid Data (APPD) transactions

As we will be adding to the August data in future releases, we would not recommend using it in isolation as an indication of market or HM Land Registry activity. When the full dataset is viewed alongside the data we’ve previously published, it adds to the overall picture of market activity.

Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.

Google Chrome (Chrome 88 onwards) is blocking downloads of our Price Paid Data. Please use another internet browser while we resolve this issue. We apologise for any inconvenience caused.

We update the data on the 20th working day of each month. You can download the:

http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-monthly-update-new-version.csv">current month as a CSV file (CSV, 18.5MB)

http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-monthly-update.txt">current month as a text file (TXT, 17.9MB)

Single file

These include standard and additional price paid data transactions received at HM Land Registry from 1 January 1995 to the most current monthly data.

Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.

The data is updated monthly and the average size of this file is 3.7 GB, you can download:

http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.txt">the complete Price Paid T
GRID release 2015-09-22 in JSON
digitalscience.figshare.com
txt
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data-science Digital-science; Digital Science (2023). GRID release 2015-09-22 in JSON [Dataset]. http://doi.org/10.6084/m9.figshare.1553267.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1553267.v1
Dataset updated
May 30, 2023
Dataset provided by
Digital Sciencehttp://digital-science.com/
Authors
Data-science Digital-science; Digital Science
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The first public release of the GRID database. Please note, the csv download only includes IDs, names & locations. See the JSON download for all metadata including types & relationships Please see here for a descriotion of the database format: https://www.grid.ac/format Release notes: Database seeded from research institutes in grant data from over 65 global funders. GeoNames IDs added to all institutes. NUTS codes added to all European institutes. Metadata added for the top 3000 Universities, majority of Germany and Australia and many more. Parent / Child relationships added for 65 super institute members (e.g. Max Planck, Chinese Academy of Sciences, etc.) External identification systems: - HESA institution codes (Higher Education Statistics Agency UK) - UCAS institution codes (Universities and Colleges Admissions Service, UK) - UKPRN institution codes (UK Provider Reference Number, UK) - 4373 Fundref codes
Football Players Data
kaggle.com
Updated Nov 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Masood Ahmed (2023). Football Players Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/6960429
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/6960429
Dataset updated
Nov 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Masood Ahmed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description:

This comprehensive dataset offers detailed information on approximately 17,000 FIFA football players, meticulously scraped from SoFIFA.com.

It encompasses a wide array of player-specific data points, including but not limited to player names, nationalities, clubs, player ratings, potential, positions, ages, and various skill attributes. This dataset is ideal for football enthusiasts, data analysts, and researchers seeking to conduct in-depth analysis, statistical studies, or machine learning projects related to football players' performance, characteristics, and career progressions.

Features:

name: Name of the player.

full_name: Full name of the player.

birth_date: Date of birth of the player.

age: Age of the player.

height_cm: Player's height in centimeters.

weight_kgs: Player's weight in kilograms.

positions: Positions the player can play.

nationality: Player's nationality.

overall_rating: Overall rating of the player in FIFA.

potential: Potential rating of the player in FIFA.

value_euro: Market value of the player in euros.

wage_euro: Weekly wage of the player in euros.

preferred_foot: Player's preferred foot.

international_reputation(1-5): International reputation rating from 1 to 5.

weak_foot(1-5): Rating of the player's weaker foot from 1 to 5.

skill_moves(1-5): Skill moves rating from 1 to 5.

body_type: Player's body type.

release_clause_euro: Release clause of the player in euros.

national_team: National team of the player.

national_rating: Rating in the national team.

national_team_position: Position in the national team.

national_jersey_number: Jersey number in the national team.

crossing: Rating for crossing ability.

finishing: Rating for finishing ability.

heading_accuracy: Rating for heading accuracy.

short_passing: Rating for short passing ability.

volleys: Rating for volleys.

dribbling: Rating for dribbling.

curve: Rating for curve shots.

freekick_accuracy: Rating for free kick accuracy.

long_passing: Rating for long passing.

ball_control: Rating for ball control.

acceleration: Rating for acceleration.

sprint_speed: Rating for sprint speed.

agility: Rating for agility.

reactions: Rating for reactions.

balance: Rating for balance.

shot_power: Rating for shot power.

jumping: Rating for jumping.

stamina: Rating for stamina.

strength: Rating for strength.

long_shots: Rating for long shots.

aggression: Rating for aggression.

interceptions: Rating for interceptions.

positioning: Rating for positioning.

vision: Rating for vision.

penalties: Rating for penalties.

composure: Rating for composure.

marking: Rating for marking.

standing_tackle: Rating for standing tackle.

sliding_tackle: Rating for sliding tackle.

Use Case:

This dataset is ideal for data analysis, predictive modeling, and machine learning projects. It can be used for:

Player performance analysis and comparison.

Market value assessment and wage prediction.

Team composition and strategy planning.

Machine learning models to predict future player potential and career trajectories.

Note:

Please ensure to adhere to the terms of service of SoFIFA.com and relevant data protection laws when using this dataset. The dataset is intended for educational and research purposes only and should not be used for commercial gains without proper authorization.
RxNorm Data
kaggle.com
bioregistry.io
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2019). RxNorm Data [Dataset]. https://www.kaggle.com/datasets/nlm-nih/nlm-rxnorm
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
National Library of Medicine
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm

RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/

Content

RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.

This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.

The following tables are included in the RxNorm dataset:

RXNCONSO contains concept and source information

RXNREL contains information regarding relationships between entities

RXNSAT contains attribute information

RXNSTY contains semantic information

RXNSAB contains source info

RXNCUI contains retired rxcui codes

RXNATOMARCHIVE contains archived data

RXNCUICHANGES contains concept changes

Update Frequency: Monthly

Fork this kernel to get started with this dataset.

Acknowledgements

https://www.nlm.nih.gov/research/umls/rxnorm/

https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm

https://cloud.google.com/bigquery/public-data/rxnorm

Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.

Banner Photo by @freestocks from Unsplash.

Inspiration

What are the RXCUI codes for the ingredients of a list of drugs?

Which ingredients have the most variety of dose forms?

In what dose forms is the drug phenylephrine found?

What are the ingredients of the drug labeled with the generic code number 072718?
Up-to-date mapping of COVID-19 treatment and vaccine development...
data.europa.eu
data.niaid.nih.gov
unknown
Updated Mar 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2021). Up-to-date mapping of COVID-19 treatment and vaccine development (covid19-help.org data dump) [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4601446?locale=hu
Explore at:
unknown(1413)Available download formats
Dataset updated
Mar 12, 2021
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The free database mapping COVID-19 treatment and vaccine development based on the global scientific research is available at https://covid19-help.org/. Files provided here are curated partial data exports in the form of .csv files or full data export as .sql script generated with pg_dump from our PostgreSQL 12 database. You can also find .png file with our ER diagram of tables in .sql file in this repository. Structure of CSV files *On our site, compounds are named as substances compounds.csv Id - Unique identifier in our database (unsigned integer) Name - Name of the Substance/Compound (string) Marketed name - The marketed name of the Substance/Compound (string) Synonyms - Known synonyms (string) Description - Description (HTML code) Dietary sources - Dietary sources where the Substance/Compound can be found (string) Dietary sources URL - Dietary sources URL (string) Formula - Compound formula (HTML code) Structure image URL - Url to our website with the structure image (string) Status - Status of approval (string) Therapeutic approach - Approach in which Substance/Compound works (string) Drug status - Availability of Substance/Compound (string) Additional data - Additional data in stringified JSON format with data as prescribing information and note (string) General information - General information about Substance/Compound (HTML code) references.csv Id - Unique identifier in our database (unsigned integer) Impact factor - Impact factor of the scientific article (string) Source title - Title of the scientific article (string) Source URL - URL link of the scientific article (string) Tested on species - What testing model was used for the study (string) Published at - Date of publication of the scientific article (Date in ISO 8601 format) clinical-trials.csv Id - Unique identifier in our database (unsigned integer) Title - Title of the clinical trial study (string) Acronym title - Acronym of title of the clinical trial study (string) Source id - Unique identifier in the source database Source id optional - Optional identifier in other databases (string) Interventions - Description of interventions (string) Study type - Type of the conducted study (string) Study results - Has results? (string) Phase - Current phase of the clinical trial (string) Url - URL to clinical trial study page on clinicaltrials.gov (string) Status - Status in which study currently is (string) Start date - Date at which study was started (Date in ISO 8601 format) Completion date - Date at which study was completed (Date in ISO 8601 format) Additional data - Additional data in the form of stringified JSON with data as locations of study, study design, enrollment, age, outcome measures (string) compound-reference-relations.csv Reference id - Id of a reference in our DB (unsigned integer) Compound id - Id of a substance in our DB (unsigned integer) Note - Id of a substance in our DB (unsigned integer) Is supporting - Is evidence supporting or contradictory (Boolean, true if supporting) compound-clinical-trial.csv Clinical trial id - Id of a clinical trial in our DB (unsigned integer) Compound id - Id of a Substance/Compound in our DB (unsigned integer) tags.csv Id - Unique identifier in our database (unsigned integer) Name - Name of the tag (string) tags-entities.csv Tag id - Id of a tag in our DB (unsigned integer) Reference id - Id of a reference in our DB (unsigned integer) API Specification Our project also has an Open API that gives you access to our data in a format suitable for processing, particularly in JSON format. https://covid19-help.org/api-specification Services are split into five endpoints: Substances - /api/substances References - /api/references Substance-reference relations - /api/substance-reference-relations Clinical trials - /api/clinical-trials Clinical trials-substances relations - /api/clinical-trials-substances Method of providing data All dates are text strings formatted in compliance with ISO 8601 as YYYY-MM-DD If the syntax request is incorrect (missing or incorrectly formatted parameters) an HTTP 400 Bad Request response will be returned. The body of the response may include an explanation. Data updated_at (used for querying changed-from) refers only to a particular entity and not its logical relations. Example: If a new substance reference relation is added, but the substance detail has not changed, this is reflected in the substance reference relation endpoint where a new entity with id and current dates in created_at and updated_at fields will be added, but in substances or references endpoint nothing has changed. The recommended way of sequential download During the first download, it is possible to obtain all data by entering an old enough date in the parameter value changed-from, for example: changed-from=2020-01-01 It is important to write down the date on which the receiving the data was initiated let’s say 2020-10-20 For repeated data downloads, it is sufficient to receive only the reco

Facebook

Twitter

Click to copy link

Link copied

Cite

Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names

USA Name Data

USA Name Data (BigQuery Dataset)

Explore at:

zip(0 bytes)Available download formats

Dataset updated

Feb 12, 2019

Dataset provided by

Data.govhttps://data.gov/

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

United States

Description

Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?

Clear search

Close search

Google apps

Main menu

USA Name Data

Context

Content

Acknowledgements

Inspiration

Facebook Names Dataset

Baby Names from Social Security Card Applications - National Data

First name file since 1900

Popular Baby Names

First name bank - Girls

Baby names for boys in England and Wales

Baby names for girls in England and Wales

Airline Dataset

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

Census Data

LinkedIn Dataset - US People Profiles

Best Books Ever Dataset

Data from: Feed the Future Grain Legumes Project Database

REM_Turku

Inspire Download Service (predefined ATOM) for data set Natural Spatial...

Price Paid Data

Using or publishing our Price Paid Data

Address data

August 2025 data (current month)

Single file

GRID release 2015-09-22 in JSON

Football Players Data

Description:

Features:

Use Case:

Note:

RxNorm Data

Context

Content

Acknowledgements

Inspiration

Up-to-date mapping of COVID-19 treatment and vaccine development...

USA Name Data

USA Name Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration