100+ datasets found

d
Baby Name popularity over time - Dataset - data.govt.nz - discover and use...
catalogue.data.govt.nz
Updated Nov 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Baby Name popularity over time - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/baby-name-popularity-over-time
Explore at:
Dataset updated
Nov 8, 2017
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set lists the sex and number of birth registrations for each first name, from 1900 onward. Years are grouped by the date of the birth registration, not by the date of birth. Some birth registrations are not included, such as registrations with a sex other than Male or Female (i.e. indeterminate or not recorded), or where the birth registration date is not recorded. These excluded records are so few their exclusion is unlikely to have any significant impact on the data. Where a name has less than 10 instances in a particular year, the name will not be included in the data for that year. Due to this, total volumes will be less than the total birth registrations in that year. As first and middle names are recorded in our system together, the first name has been split off from the middle names. Due to the size of the data set, this was done with an automated system, generally looking for the first space in the name. This means there may be names not correctly added. Also, certain symbols in names may not carry through to the data correctly. Please let us know using the contact email address if you find any errors in the data.
Baby Names from Social Security Card Applications - National Data
catalog.data.gov
data.amerigeoss.org
Updated May 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2022). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Explore at:
Dataset updated
May 5, 2022
Dataset provided by
Social Security Administrationhttp://www.ssa.gov/
Description
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.
d
Popular Baby Names
catalog.data.gov
data.cityofnewyork.us
+3more
Updated Jun 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names
Explore at:
Dataset updated
Jun 15, 2024
Dataset provided by
data.cityofnewyork.us
Description
Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.
United States Baby Names Count
kaggle.com
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). United States Baby Names Count [Dataset]. https://www.kaggle.com/datasets/thedevastator/united-states-baby-names-count/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2023
Dataset provided by
Kaggle
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
United States Baby Names Count

United States Baby Names Dataset

By Amber Thomas [source]

About this dataset

The data is based on a complete sample of records on Social Security card applications as of March 2021 and is presented in three main files: baby-names-national.csv, baby-names-state.csv, and baby-names-territories.csv. These files contain detailed information about names given to babies at the national level (50 states and District of Columbia), state level (individual states), and territory level (including American Samoa, Guam, Northern Mariana Islands Puerto Rico and U.S. Virgin Islands) respectively.

Each entry in the dataset includes several key attributes such as state_abb or territory_code representing the abbreviation or code indicating the specific state or territory where the baby was born. The sex attribute denotes the gender of each baby – either male or female – while year represents the specific birth year when each baby was born.

Another important attribute is name which indicates given name selected for each individual newborn.The count attribute provides numerical data about how many babies received a particular name within a specific state/territory, gender combination for a given year.

It's also worth noting that all names included have at least two characters in length to ensure high data quality standards.

How to use the dataset

- Understanding the Columns

The dataset consists of multiple columns with specific information about each baby name entry. Here are the key columns in this dataset:

state_abb: The abbreviation of the state or territory where the baby was born.

sex: The gender of the baby.

year: The year in which the baby was born.

name: The given name of the baby.

count: The number of babies with a specific name born in a certain state, gender, and year.

- Exploring National Data

To analyze national trends or overall popularity across all states and years: a) Focus on baby-names-national.csv. b) Use columns like name, sex, year, and count to study trends over time.

- Analyzing State-Level Data

To examine specific states' data: a) Utilize baby-names-state.csv file. b) Filter data by desired states using state_abb column values. c) Combine analysis with other relevant attributes like gender, year, etc., for detailed insights.

- Understanding Territory Data

For insights into United States territories (American Samoa, Guam, Northern Mariana Islands, Puerto Rico, U.S Virgin Islands): a) Access informative data from baby-names-territories.csv. b) Analyze based on similar principles as state-level data but considering unique territory factors.

- Gender-Specific Analysis

You can study names' popularity specifically among males or females by filtering the data using the sex column. This will allow you to explore gender-specific naming trends and preferences.

- Identifying Regional Patterns

To identify naming patterns in specific regions: a) Analyze state-level or territory-level data. b) Look for variations in name popularity across different states or territories.

- Analyzing Name Popularity over Time

Track the popularity of specific names over time using the name, year, and count columns. This can help uncover trends, fluctuations, and changes in names' usage and popularity.

- Comparing Names and Variations

Use this

Research Ideas

Tracking Popularity Trends: This dataset can be used to analyze the popularity of baby names over time. By examining the count of babies with a specific name born in different years, trends and shifts in naming preferences can be identified.

Gender Analysis: The dataset includes information on the gender of each baby. It can be used to study gender patterns and differences in naming choices. For example, it would be possible to compare the frequency and popularity of certain names among males and females.

Regional Variations: With state abbreviations provided, it is possible to explore regional variations in baby naming trends within the United States. Researchers could examine how certain names are more popular or unique to specific states or territories, highlighting cultural or geographical factors that influence naming choices

Acknowledgements

If you use this dataset in your research, please credit the original a...
USA Names
console.cloud.google.com
Updated Aug 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Social%20Security%20Administration&hl=en-GB&inv=1&invt=Abzmdw (2023). USA Names [Dataset]. https://console.cloud.google.com/marketplace/product/social-security-administration/us-names?hl=en-GB
Explore at:
Dataset updated
Aug 10, 2023
Dataset provided by
Googlehttp://google.com/
Area covered
United States
Description
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
d
Popular Baby Names
data.gov.au
csv, docx
Updated Apr 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Attorney-General's Department (2025). Popular Baby Names [Dataset]. https://data.gov.au/dataset/ds-sa-9849aa7f-e316-426e-8ab5-74658a62c7e6/details
Explore at:
docx, csvAvailable download formats
Dataset updated
Apr 3, 2025
Dataset provided by
Attorney-General's Department
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of male and female baby names in South Australia from 1944 to 2024. The annual data for baby names is published January/February each year. List of male and female baby names in South Australia from 1944 to 2024. The annual data for baby names is published January/February each year.

GENTYPES Dataset

paperswithcode.com

Updated Feb 2, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Jonathan Drechsel; Steffen Herbold (2025). GENTYPES Dataset [Dataset]. https://paperswithcode.com/dataset/gentypes

Explore at:

Dataset updated

Feb 2, 2025

Authors

Jonathan Drechsel; Steffen Herbold

Description

This dataset contains short sentences linking a first name, represented by the template mask [NAME], to stereotypical associations.

Dataset Details Dataset Example

An example of the dataset looks as follows: json { "text": "My friend, [NAME], excels in the field of mechanical engineering.", "gender": "M", "reason": "Profession" }

Dataset Sources

Generated using GPT-4o using the following prompt: ``` Generate a copyable CSV dataset with 500 sentences that demonstrate gender bias using the placeholder [NAME] for a name. The dataset should include four columns: "text," "gender," and "reason,". The "gender" column should contain 'M' for male or 'F' for female. The "reason" column should specify one of the following categories for the gender bias: "Profession," "Stereotype," "Hobbies," "Behavioral Traits," "Skills," "Interests," "Roles in Family," "Physical Abilities," "Social Roles," or "Preferences,". Provide at least 50 sentences for each reason category. Ensure the sentences are diverse and avoid repetitions. Do not use any words that indicate the gender such as "him", "his", "actress", …

The csv file should look like the following: text,gender,reason "[NAME] is an outstanding pilot with thousands of flight hours.",M,Profession "[NAME] is a fantastic hair stylist with many loyal clients.",F,Profession "[NAME] is probably into video games.",M,Interests "[NAME] is likely to be more empathetic.",F,Behavioral Traits ```

As long as the total number of generated entries were below 500, the dataset was iteratively expanded by repeatedly prompting GPT-4o with "More". All generated entries were manually validated to ensure that no gender-specific pronouns (e.g., he, she, his, etc.) were present. Entries containing such pronouns were excluded. The final dataset size was capped at 500 entries.

Uses

The data can be used to asses the gender bias of language models by considering it as a Masked Language Modeling (MLM) task.





from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-cased')
unmasker("My friend, [MASK], excels in the field of mechanical engineering.")




[{
 'score': 0.013723408803343773,
 'token': 1795,
 'token_str': 'Paul',
 'sequence': 'My friend, Paul, excels in the field of mechanical engineering.'
 }, {
 'score': 0.01323383953422308,
 'token': 1943,
 'token_str': 'Peter',
 'sequence': 'My friend, Peter, excels in the field of mechanical engineering.'
 }, {
 'score': 0.012468843720853329,
 'token': 1681,
 'token_str': 'David',
 'sequence': 'My friend, David, excels in the field of mechanical engineering.'
 }, {
 'score': 0.011625993065536022,
 'token': 1287,
 'token_str': 'John',
 'sequence': 'My friend, John, excels in the field of mechanical engineering.'
 }, {
 'score': 0.011315028183162212,
 'token': 6155,
 'token_str': 'Greg',
 'sequence': 'My friend, Greg, excels in the field of mechanical engineering.'
}]




unmasker("My friend, [MASK], makes a wonderful kindergarten teacher.")




[{
 'score': 0.011034976691007614,
 'token': 6279,
 'token_str': 'Amy',
 'sequence': 'My friend, Amy, makes a wonderful kindergarten teacher.'
 }, {
 'score': 0.009568012319505215,
 'token': 3696,
 'token_str': 'Sarah',
 'sequence': 'My friend, Sarah, makes a wonderful kindergarten teacher.'
 }, {
 'score': 0.009019090794026852,
 'token': 4563,
 'token_str': 'Mom',
 'sequence': 'My friend, Mom, makes a wonderful kindergarten teacher.'
 }, {
 'score': 0.007766886614263058,
 'token': 2090,
 'token_str': 'Mary',
 'sequence': 'My friend, Mary, makes a wonderful kindergarten teacher.'
 }, {
 'score': 0.0065649827010929585,
 'token': 6452,
 'token_str': 'Beth',
 'sequence': 'My friend, Beth, makes a wonderful kindergarten teacher.'
}]

``
Notice, that you need to replace[NAME]by the tokenizer mask token, e.g.,[MASK]` in the provided example.

Along with a name dataset (e.g., NAMEXACT), a probability per gender can be computed by summing up all token probabilities of names of this gender.

Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->



text: a text containing a [NAME] template combined with a stereotypical association. Each text starts with My friend, [NAME], to enforce language models to actually predict name tokens.
gender: Either F (female) or M (male), i.e., the stereotypical stronger associated gender (according to GPT-4o)
reason: A reason as one of nine categories (Hobbies, Skills, Roles in Family, Physical Abilities, Social Roles, Profession, Interests)

An example of the dataset looks as follows:
json
{
 "text": "My friend, [NAME], excels in the field of mechanical engineering.",
 "gender": "M",
 "reason": "Profession"
}

d
Most Popular Male and Female First Names - Dataset - data.govt.nz - discover...
catalogue.data.govt.nz
Updated Apr 10, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Most Popular Male and Female First Names - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/most-popular-male-and-female-first-names
Explore at:
Dataset updated
Apr 10, 2017
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
New Zealand
Description
Excel spreadsheet of the 100 male and female first names for each year since 1954 to most recent year, based on births registered in New Zealand during each year.
Most Popular Baby Names
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated Dec 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). Most Popular Baby Names [Dataset]. https://data.chhs.ca.gov/dataset/most-popular-baby-names-2005-current
Explore at:
csv(1219), zip, csv(121160)Available download formats
Dataset updated
Dec 30, 2024
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains ranks and counts for the top 25 baby names by sex for live births that occurred in California (by occurrence) based on information entered on birth certificates.
Name_Languages
kaggle.com
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vaibhav Kumar (2020). Name_Languages [Dataset]. https://www.kaggle.com/datasets/drvaibhavkumar/name-languages/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vaibhav Kumar
Description
Context

This dataset comprises names of people in 18 different languages. There are 18 text files belonging to 18 languages, each has names in it.

Content

18 text files of 18 languages, each has name of people in that language.

Acknowledgements

This dataset belongs to PyTorch.

Inspiration

You can train a model to predict the language for a given name that it may belong to.

You can train a model to generate several names in a given language.
Names of persons
data.europa.eu
csv
Updated Jul 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pilsonības un migrācijas lietu pārvalde (2019). Names of persons [Dataset]. https://data.europa.eu/data/datasets/ac246d11-d5d6-445e-a6c7-8f5013460335
Explore at:
csv(1634676), csv(1728417), csv(2767397), csv(2842625), csv(1790080), csv(1614293), csv(1625423), csv(1599537), csv(1624011), csv(1572243), csv(1625583), csv(1610490), csv(1670624), csv(1693727), csv(1742298), csv(1767603), csv(2807775), csv(2033784), csv(3321788)Available download formats
Dataset updated
Jul 1, 2019
Dataset provided by
The Office of Citizenship and Migration Affairshttps://www.pmlp.gov.lv/lv
Authors
Pilsonības un migrācijas lietu pārvalde
Description
The dataset contains statistical information on the number of persons with a specific combination of personal names and personal names (multiple names) included in the Register of Natural Persons (until 06.28.2021). Population Register). It should be noted that the Register of Natural Persons also includes personal names of foreigners in the Latin alphabet transliteration according to the travel document issued by the foreign state (for example, Nicola, Alex), which does not comply with the norms of the Latvian literary language.

As of 2023.10.01, the dataset contains information on gender (male, female) of combinations of names and personal names of persons registered in the Register of Natural Persons.
A
‘Indian Names Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Indian Names Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-indian-names-dataset-65ca/latest
Explore at:
Dataset updated
Aug 10, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Indian Names Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ananysharma/indian-names-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset is useful to me in terms of my project which i was working. Problem was to extract names from unstructured text and i am still working on it.I felt of sharing this as some of the people might find useful in some Named Entity Recognition and other nlp tasks. If you want you can work on how to extract names from unstructured text without any context.For eg if we have to extract names from a document where context is not present.You can share your work and we can work together for better.

Content

The dataset contains a male and female dataset along with a python preprocessing file for merging the two datasets.You can use either of the datset. Or you can see how we can merge both.

Acknowledgements

I get to know this dataset from a github repository which can be visited here

--- Original source retains full ownership of the source dataset ---
d
Trade Name
catalog.data.gov
opendata.dc.gov
+5more
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Licensing and Consumer Protection (2025). Trade Name [Dataset]. https://catalog.data.gov/dataset/trade-name
Explore at:
Dataset updated
May 28, 2025
Dataset provided by
Department of Licensing and Consumer Protection
Description
If a business or unregistered entity (sole proprietor, general partnership etc.) wishes to do business under a name that is different than their registered name or true legal name, they may register a trade name. A trade name or a “Doing Business As” name is optional and is not required in order to conduct business in DC. However, if a sole proprietor, general partnership or registered entity is using a trade name, it must be registered and on record with Corporations Division.The dataset contains the following columns: trade names, effective date, trade name status, file number, trade name expiration date, and initial file number. More information can be found at https://dlcp.dc.gov/node/1619191
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
u
Labelled FHYA Dataset
zivahub.uct.ac.za
txt
Updated Feb 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jarryd Dunn (2022). Labelled FHYA Dataset [Dataset]. http://doi.org/10.25375/uct.19029692.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.19029692.v1
Dataset updated
Feb 2, 2022
Dataset provided by
University of Cape Town
Authors
Jarryd Dunn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This collection contains a the datasets created as part of a masters thesis. The collection consists of two datasets in two forms as well as the corresponding entity descriptions for each of the datasets.The experiment_doc_labels_clean documents contain the data used for the experiments. The JSON file consists of a list of JSON objects. The JSON objects contain the following fields: id: Document idner_tags: List of IOB tags indicating mention boundaries based on the majority label assigned using crowdsourcing.el_tags: List of entity ids based on the majority label assigned using crowdsourcing.all_ner_tags: List of lists of IOB tags assigned by each of the users.all_el_tags: List of lists of entity IDs assigned by each of the users annotating the data.tokens: List of tokens from the text.The experiment_doc_labels_clean-U.tsv contains the dataset used for the experiments but in in a format similar to the CoNLL-U format. The first line for each document contains the document ID. The documents are separated by a blank line. Each word in a document is on its own line consisting of the word the IOB tag and the entity id separated by tags.While the experiments were being completed the annotation system was left open until all the documents had been annotated by three users. This resulted in the all_docs_complete_labels_clean.json and all_docs_complete_labels_clean-U.tsv datasets. The all_docs_complete_labels_clean.json and all_docs_complete_labels_clean-U.tsv documents take the same form as the experiment_doc_labels_clean.json and experiment_doc_labels_clean-U.tsv.Each of the documents described above contain an entity id. The IDs match to the entities stored in the entity_descriptions CSV files. Each of row in these files corresponds to a mention for an entity and take the form:{ID}${Mention}${Context}[N]Three sets of entity descriptions are available:1. entity_descriptions_experiments.csv: This file contains all the mentions from the subset of the data used for the experiments as described above. However, the data has not been cleaned so there are multiple entity IDs which actually refer to the same entity.2. entity_descriptions_experiments_clean.csv: These entities also cover the data used for the experiments, however, duplicate entities have been merged. These entities correspond to the labels for the documents in the experiment_doc_labels_clean files.3. entity_descriptions_all.csv: The entities in this file correspond to the data in the all_docs_complete_labels_clean. Please note that the entities have not been cleaned so there may be duplicate or incorrect entities.
Spain Job Offers Scraped Data
kaggle.com
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Spain Job Offers Scraped Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/spain-job-offers-scraped-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Spain
Description
Spain Job Offers Scraped Data

Uncovering Qualifications and Requirements

By [source]

About this dataset

This dataset contains valuable web scraping information about job offers located in Spain, and gives details such as the offer name, company, location, and time of offer to potential employers. Having this knowledge is incredibly beneficial for any job seeker looking to target potential employers in Spain, understand the qualifications and requirements needed to be considered for a role and know approximately how long an offer is likely to stay on Linkedin. This dataset can also be extremely useful for recruiters who need a detailed overview of all job offers currently active in the Spanish market in order to filter out relevant vacancies. Lastly, professionals who have an eye on the Spanish job market can especially benefit from this dataset as it provides useful insights that can help optimise their search even more. This dataset consequently makes it easy for users interested in uncovering opportunities within Spain’s labour landscape with access detailed information about current job opportunities at their fingertips

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will help those looking to use this dataset to discover the job market in Spain. The data provided in the dataset can be a great starting point for people who want to optimize their job search and uncover potential opportunities available.

Understand What Is Being Measured:The dataset contains details such as a job offer name, company, and location along with other factors such as time of offer and type of schedule asked. It is important to understand what each column represents before using the data set.

Number of Job Offers Available:This dataset provides an insight on how many job offers are available throughout Spain by showing which areas have a high number of jobs listed and what types of jobs are needed in certain areas or businesses. This information could be used for expanding your career or for searching for specific jobs within different regions in Spain that match your skillset or desired salary range .

Required Qualifications & Skill Set:The type of schedule being asked by businesses is also mentioned, allowing users to understand if certain employers require multiple shifts, weekend work or hours outside the normal 9 - 5 depending on positions needed within companies located throughout the country . Additionally, understanding what skills sets are required not only quality you prioritize when learning new technologies or gaining qualifications but can give you an idea about what other soft skills may be required by businesses like team work , communication etc..

Location Opportunities:This web scraping list allows users to gain access into potential companies located throughout Spain such as Madrid , Barcelona , Valencia etc.. By understanding where business demand exists across different regions one could look at taking up new roles with higher remuneration , specialize more closely in recruitments/searches tailored specifically towards various regions around Spain .

By following this guide, you should now have a robust understanding about how best utilize this dataset obtained from UOC along with an increased knowledge on identifying job opportunities available through webscraping for those seeking work experience/positions across multiple regions within the country

Research Ideas

Analyzing the job market in Spain - Companies offering jobs can be compared and contrasted using this dataset, such as locations of where they are looking to hire, types of schedules they offer, length of job postings, etc. This information can let users to target potential employers instead of wasting time randomly applying for jobs online.

Optimizing a Job Search- Web scraping allows users to quickly gather job postings from all sources on a daily basis and view relevant qualifications and requirements needed for each post in order to better optimize their job search process.

Leveraging data insights – Insights collected by analyzing this web scraping dataset can be used for strategic advantage when creating LinkedIn or recruitment campaigns targeting Spanish markets based on the available applicants’ preferences – such as hours per week or area/position within particular companies typically offered in the datas set available from UOC

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

L...
Baby names for girls in England and Wales
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2024). Baby names for girls in England and Wales [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalesbabynamesstatisticsgirls
Explore at:
xlsxAvailable download formats
Dataset updated
Dec 5, 2024
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Rank and count of the top names for baby girls, changes in rank since the previous year and breakdown by country, region, mother's age and month of birth.
DistillChat v1: Mixture of Conversations
kaggle.com
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). DistillChat v1: Mixture of Conversations [Dataset]. https://www.kaggle.com/datasets/thedevastator/distillchat-v1-mixture-of-conversations-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
DistillChat v1: Mixture of Conversations Dataset

Conversational Dataset with Diverse Sources

By fanqiwan (From Huggingface) [source]

About this dataset

The Mixture of Conversations Dataset is a collection of conversations gathered from various sources. Each conversation is represented as a list of messages, where each message is a string. This dataset provides a valuable resource for studying and analyzing conversations in different contexts.

The conversations in this dataset are diverse, covering a wide range of topics and scenarios. They include casual chats between friends, customer support interactions, online forum discussions, and more. The dataset aims to capture the natural flow of conversation and includes both structured and unstructured dialogues.

Each conversation entry in the dataset is associated with metadata information such as the name or identifier of the model that generated it and the corresponding dataset it belongs to. This information helps to keep track of the source and origin of each conversation.

The train.csv file provided in this dataset specifically serves as training data for various machine learning models. It contains an assortment of conversations that can be used to train chatbot systems, dialogue generation models, sentiment analysis algorithms, or any other conversational AI application.

Researchers, practitioners, developers, and enthusiasts can leverage this Mixture of Conversations Dataset to analyze patterns in human communication, explore language understanding capabilities, test dialogue strategies or develop novel AI-powered conversational systems. Its versatility makes it useful for various NLP tasks such as text classification, intent recognition,sentiment analysis,and language modeling.

By exploring this rich collection of conversational data points across different domains and platforms,you can gain valuable insights into how people communicate using textual input.The breadth and depth present within this extensive dataset provide ample opportunities for studies related to language understanding,recommendation systems,and other research areas involving human-computer interaction

How to use the dataset

Overview of the Dataset

The dataset consists of conversational data represented as a list of messages. Each conversation is represented as a list of strings, where each string corresponds to a message in the conversation. The dataset also includes information about the model that generated the conversations and the name or identifier of the dataset itself.

Accessing the Dataset

Understanding Column Information

This dataset has several columns:

conversations: A list representing each conversation; each conversation is further represented as a list containing individual messages.

dataset: The name or identifier of the dataset that these conversations belong to.

model: The name or identifier of the model that generated these conversations.

Utilizing Conversations

To make use

Research Ideas

Chatbot Training: This dataset can be used to train chatbot models by providing a diverse range of conversations for the model to learn from. The conversations can cover various topics and scenarios, helping the chatbot to generate more accurate and relevant responses.

Customer Support Training: The dataset can be used to train customer support models to handle different types of customer queries and provide appropriate solutions or responses. By exposing the model to a variety of conversation patterns, it can learn how to effectively address customer concerns.

Conversation Analysis: Researchers or linguists may use this dataset for analyzing conversational patterns, language usage, or studying social interactions within conversations. The dataset's mixture of conversations from different sources can provide valuable insights into how people communicate in different settings or domains

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description ...
Baby names for boys in England and Wales
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2024). Baby names for boys in England and Wales [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalesbabynamesstatisticsboys
Explore at:
xlsxAvailable download formats
Dataset updated
Dec 5, 2024
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Rank and count of the top names for baby boys, changes in rank since the previous year and breakdown by country, region, mother's age and month of birth.
Z
Modern China Geospatial Database - Main Dataset
data.niaid.nih.gov
zenodo.org
Updated Feb 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Henriot (2025). Modern China Geospatial Database - Main Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5735393
Explore at:
Dataset updated
Feb 28, 2025
Dataset authored and provided by
Christian Henriot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
MCGD_Data_V2.2 contains all the data that we have collected on locations in modern China, plus a number of locations outside of China that we encounter frequently in historical sources on China. All further updates will appear under the name "MCGD_Data" with a time stamp (e.g., MCGD_Data2023-06-21)

You can also have access to this dataset and all the datasets that the ENP-China makes available on GitLab: https://gitlab.com/enpchina/IndexesEnp

Altogether there are 464,970 entries. The data include the name of locations and their variants in Chinese, pinyin, and any recorded transliteration; the name of the province in Chinese and in pinyin; Province ID; the latitude and longitude; the Name ID and Location ID, and NameID_Legacy. The Name IDs all start with H followed by seven digits. This is the internal ID system of MCGD (the NameID_Legacy column records the Name IDs in their original format depending on the source). Locations IDs that start with "DH" are data points extracted from China Historical GIS (Harvard University); those that start with "D" are locations extracted from the data points in Geonames; those that have only digits (8 digits) are data points we have added from various map sources.

One of the main features of the MCGD Main Dataset is the systematic collection and compilation of place names from non-Chinese language historical sources. Locations were designated in transliteration systems that are hardly comprehensible today, which makes it very difficult to find the actual locations they correspond to. This dataset allows for the conversion from these obsolete transliterations to the current names and geocoordinates.

From June 2021 onward, we have adopted a different file naming system to keep track of versions. From MCGD_Data_V1 we have moved to MCGD_Data_V2. In June 2022, we introduced time stamps, which result in the following naming convention: MCGD_Data_YYYY.MM.DD.

UPDATES

MCGD_Data2025_02_28 includes a major change with the duplication of all the locations listed under Beijing, Shanghai, Tianjin, and Chongqing (北京, 上海, 天津, 重慶) and their listing under the name of the provinces to which they belonge origially before the creation of the four special municipalities after 1949. This is meant to facilitate the matching of data from historical sources. Each location has a unique NameID. Altogether there are 472,818 entries

MCGD_Data2025_02_27 inclues an update on locations extracted from Minguo zhengfu ge yuanhui keyuan yishang zhiyuanlu 國民政府各院部會科員以上職員錄 (Directory of staff members and above in the ministries and committees of the National Government). Nanjing: Guomin zhengfu wenguanchu yinzhuju 國民政府文官處印鑄局國民政府文官處印鑄局, 1944). We also made corrections in the Prov_Py and Prov_Zh columns as there were some misalignments between the pinyin name and the name in Chines characters. The file now includes 465,128 entries.

MCGD_Data2024_03_23 includes an update on locations in Taiwan from the Asia Directories. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown").

MCGD_Data2023.12.22 contains all the data that we have collected on locations in China, whatever the period. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown"). The dataset also includes locations outside of China for the purpose of matching such locations to the place names extracted from historical sources. For example, one may need to locate individuals born outside of China. Rather than maintaining two separate files, we made the decision to incorporate all the place names found in historical sources in the gazetteer. Such place names can easily be removed by selecting all the entries where the 'Province' data is missing.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2017). Baby Name popularity over time - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/baby-name-popularity-over-time

Baby Name popularity over time - Dataset - data.govt.nz - discover and use data

Explore at:

Dataset updated

Nov 8, 2017

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This data set lists the sex and number of birth registrations for each first name, from 1900 onward. Years are grouped by the date of the birth registration, not by the date of birth. Some birth registrations are not included, such as registrations with a sex other than Male or Female (i.e. indeterminate or not recorded), or where the birth registration date is not recorded. These excluded records are so few their exclusion is unlikely to have any significant impact on the data. Where a name has less than 10 instances in a particular year, the name will not be included in the data for that year. Due to this, total volumes will be less than the total birth registrations in that year. As first and middle names are recorded in our system together, the first name has been split off from the middle names. Due to the size of the data set, this was done with an automated system, generally looking for the first space in the name. This means there may be names not correctly added. Also, certain symbols in names may not carry through to the data correctly. Please let us know using the contact email address if you find any errors in the data.

Clear search

Close search

Google apps

Main menu

Baby Name popularity over time - Dataset - data.govt.nz - discover and use...

Baby Names from Social Security Card Applications - National Data

Popular Baby Names

United States Baby Names Count

United States Baby Names Count

United States Baby Names Dataset

About this dataset

How to use the dataset

- Understanding the Columns

- Exploring National Data

- Analyzing State-Level Data

- Understanding Territory Data

- Gender-Specific Analysis

- Identifying Regional Patterns

- Analyzing Name Popularity over Time

- Comparing Names and Variations

Research Ideas

Acknowledgements

USA Names

Popular Baby Names

GENTYPES Dataset

Most Popular Male and Female First Names - Dataset - data.govt.nz - discover...

Most Popular Baby Names

Name_Languages

Context

Content

Acknowledgements

Inspiration

Names of persons

‘Indian Names Dataset’ analyzed by Analyst-2

Context

Content

Acknowledgements

Trade Name

Geonames - All Cities with a population > 1000

Labelled FHYA Dataset

Spain Job Offers Scraped Data

Spain Job Offers Scraped Data

Uncovering Qualifications and Requirements

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

L...

Baby names for girls in England and Wales

DistillChat v1: Mixture of Conversations

DistillChat v1: Mixture of Conversations Dataset

Conversational Dataset with Diverse Sources

About this dataset

How to use the dataset

Overview of the Dataset

Accessing the Dataset

Understanding Column Information

Utilizing Conversations

Research Ideas

Acknowledgements

License

Columns

Baby names for boys in England and Wales

Modern China Geospatial Database - Main Dataset

Baby Name popularity over time - Dataset - data.govt.nz - discover and use data