100+ datasets found

Data from: Mental Health Services Children & Young People
kaggle.com
Updated Jan 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Mental Health Services Children & Young People [Dataset]. https://www.kaggle.com/datasets/thedevastator/mental-health-services-children-young-people/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 21, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
Mental Health Services Children & Young People

Monthly Statistics on Referrals, Contacts and Care

By data.world's Admin [source]

About this dataset

This dataset provides essential information on the mental health services provided to children and young people in England. The data contained within the Mental Health Services Data Set (MHSDS) - Children & Young People covers a variety of different categories during a given reporting period, including primary level details, secondary level descriptions, number of open referrals for children's and young people's mental health services at the end of the reporting period, as well as number of first attended contacts for referrals open in the reporting period aged 0-18. It also provides insight into how many people are in contact with mental health services aged 0 to 18 at the time of reporting, how many referrals starting during this time were self-refreshers and more. This dataset includes valuable information that is necessary to better track and understand trends in order to provide more effective care

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will provide you with an overview of the data contained in this dataset as well as information on how to effectively use it for your own research or personal purposes. Let's get started!

Overview of Data Fields

REPORTING_PERIOD: The month and year of the reporting period (Date)

BREAKDOWN: The type of breakdown of the data (String)

PRIMARY_LEVEL: The primary level of the data (String)

PRIMARY_LEVEL_DESCRIPTION: A description at the primary level of the data (String)

SECONDARY_LEVEL: The secondary level of the data (String)

Research Ideas

Evaluating the efficacy of existing mental health services for children and young people by examining changes in relationships between different aspects of service delivery (e.g. referral activity, hospital spell activity, etc).

Analysing geographical trends in mental health services to inform investment decisions and policies across different regions.

Identifying areas of high need among vulnerable or marginalised citizens, such as those aged 0-18 or those with particular genetic makeup, to better target resources and support those most in need of help

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: mhsds-monthly-cyp-data-file-feb-fin-2017-1.csv | Column name | Description | |:-------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------| | REPORTING_PERIOD | The period of time for which the data was collected. (String) | | BREAKDOWN | The breakdown of the data by age group. (String) | | PRIMARY_LEVEL | The primary level of the data. (String) | | PRIMARY_LEVEL_DESCRIPTION ...
f
ORBIT: A real-world few-shot dataset for teachable object recognition...
city.figshare.com
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25383/city.14294597.v3
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.
d
Traffic Crashes - People
catalog.data.gov
data.cityofchicago.org
Updated Aug 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2025). Traffic Crashes - People [Dataset]. https://catalog.data.gov/dataset/traffic-crashes-people
Explore at:
Dataset updated
Aug 30, 2025
Dataset provided by
data.cityofchicago.org
Description
This data contains information about people involved in a crash and if any injuries were sustained. This dataset should be used in combination with the traffic Crash and Vehicle dataset. Each record corresponds to an occupant in a vehicle listed in the Crash dataset. Some people involved in a crash may not have been an occupant in a motor vehicle, but may have been a pedestrian, bicyclist, or using another non-motor vehicle mode of transportation. Injuries reported are reported by the responding police officer. Fatalities that occur after the initial reports are typically updated in these records up to 30 days after the date of the crash. Person data can be linked with the Crash and Vehicle dataset using the “CRASH_RECORD_ID” field. A vehicle can have multiple occupants and hence have a one to many relationship between Vehicle and Person dataset. However, a pedestrian is a “unit” by itself and have a one to one relationship between the Vehicle and Person table. The Chicago Police Department reports crashes on IL Traffic Crash Reporting form SR1050. The crash data published on the Chicago data portal mostly follows the data elements in SR1050 form. The current version of the SR1050 instructions manual with detailed information on each data elements is available here. Change 11/21/2023: We have removed the RD_NO (Chicago Police Department report number) for privacy reasons.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
How Common is Your Birthday?
kaggle.com
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). How Common is Your Birthday? [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-births-how-common-is-your-birthday
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Description
US Births - How Common is Your Birthday?

How popular is your birthday?

By Andy Kriebel [source]

About this dataset

The file contains data on births in the United States from 1994 to 2014. The data includes the following columns: year: The year of the observation. (Integer) month: The month of the observation. (Integer) date_of_month: The date of the observation. (Integer) day_of_week: The day of the week of the observation. (Integer) births: The number of births on the given day. (Integer)

How to use the dataset

The US Births dataset on Kaggle contains data on births in the United States from 1994 to 2014. The data is broken down by year, month, date of month, day of week, and births.

This dataset can be used to answer questions about when people are born, how common certain birthdays are, and any trends over time. For example, you could use this dataset to find out which day of the week has the most births or which month has the most births

Research Ideas

Determining which day of the year and what time of day that people are mostly born to help with staffing levels in maternity wards

Identifying trends in baby names over time

Predicting the number of births on a given day

Acknowledgements

This data set is a combined effort of the U.S. National Center for Health Statistics and the U.S. Social Security Administration, provided by FiveThirtyEight. It contains data on births in the United States from 1994 to 2014, with the following columns: year, month, date_of_month, day_of_week, births

->Thank you to FiveThirtyEight for providing this dataset!

Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: US_births_1994-2014.csv | Column name | Description | |:------------------|:---------------------------------------------| | year | Year of the data. (Integer) | | month | Month of the data. (Integer) | | date_of_month | Day of the month of the data. (Integer) | | day_of_week | Day of the week of the data. (Integer) | | births | Number of births on the given day. (Integer) |

Acknowledgements

If you use this dataset in your research, please credit Andy Kriebel.
d
2.02 Customer Service (detail)
catalog.data.gov
open.tempe.gov
+4more
Updated Aug 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). 2.02 Customer Service (detail) [Dataset]. https://catalog.data.gov/dataset/2-02-customer-service-detail-be51b
Explore at:
Dataset updated
Aug 11, 2025
Dataset provided by
City of Tempe
Description
This dataset provides Customer Service Satisfaction results from the Annual Community Survey. The survey questions assess satisfaction with overall customer service for individuals who had contacted the city in the past year. For years where there are multiple questions related to overall customer service and treatment, the average of those responses is provided in the summary dataset, and the values for each question are provided in the detailed dataset. For years 2010-2014, respondents were first asked, "Have you contacted the city in the past year?". If they answered that they had contacted the city, then they were asked additional questions about their experience. The "number of respondents" field represents the number of people who answered yes to the contact question. Responses of "don't know" are not included in this dataset, but can be found in the dataset for the entire Community Survey. A survey was not completed for 2015 (99999 indicates no recorded data). Due to changes in the survey questions, this dataset was last updated in 2017 and may not be updated again. The performance measure dashboard is available at 2.02 Customer Service Satisfaction. Additional InformationSource: Community Attitude SurveyContact: Wydale HolmesContact E-Mail: Wydale_Holmes@tempe.govData Source Type: Excel and PDFPreparation Method: Extracted from Annual Community Survey resultsPublish Frequency: AnnualPublish Method: ManualData Dictionary

Empathy dataset

zenodo.org
data.niaid.nih.gov

bin, csv, html

Updated Dec 18, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Zenodo (2024). Empathy dataset [Dataset]. http://doi.org/10.5281/zenodo.7683907

Explore at:

bin, html, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7683907

Dataset updated

Dec 18, 2024

Dataset provided by

Zenodohttp://zenodo.org/

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The database for this study (Briganti et al. 2018; the same for the Braun study analysis) was composed of 1973 French-speaking students in several universities or schools for higher education in the following fields: engineering (31%), medicine (18%), nursing school (16%), economic sciences (15%), physiotherapy, (4%), psychology (11%), law school (4%) and dietetics (1%). The subjects were 17 to 25 years old (M = 19.6 years, SD = 1.6 years), 57% were females and 43% were males. Even though the full dataset was composed of 1973 participants, only 1270 answered the full questionnaire: missing data are handled using pairwise complete observations in estimating a Gaussian Graphical Model, meaning that all available information from every subject are used.

The feature set is composed of 28 items meant to assess the four following components: fantasy, perspective taking, empathic concern and personal distress. In the questionnaire, the items are mixed; reversed items (items 3, 4, 7, 12, 13, 14, 15, 18, 19) are present. Items are scored from 0 to 4, where “0” means “Doesn’t describe me very well” and “4” means “Describes me very well”; reverse-scoring is calculated afterwards. The questionnaires were anonymized. The reanalysis of the database in this retrospective study was approved by the ethical committee of the Erasmus Hospital.

Size: A dataset of size 1973*28

Number of features: 28

Ground truth: No

Type of Graph: Mixed graph

The following gives the description of the variables:

Feature	FeatureLabel	Domain	Item meaning from Davis 1980
001	1FS	Green	I daydream and fantasize, with some regularity, about things that might happen to me.
002	2EC	Purple	I often have tender, concerned feelings for people less fortunate than me.
003	3PT_R	Yellow	I sometimes find it difficult to see things from the “other guy’s” point of view.
004	4EC_R	Purple	Sometimes I don’t feel very sorry for other people when they are having problems.
005	5FS	Green	I really get involved with the feelings of the characters in a novel.
006	6PD	Red	In emergency situations, I feel apprehensive and ill-at-ease.
007	7FS_R	Green	I am usually objective when I watch a movie or play, and I don’t often get completely caught up in it.(Reversed)
008	8PT	Yellow	I try to look at everybody’s side of a disagreement before I make a decision.
009	9EC	Purple	When I see someone being taken advantage of, I feel kind of protective towards them.
010	10PD	Red	I sometimes feel helpless when I am in the middle of a very emotional situation.
011	11PT	Yellow	sometimes try to understand my friends better by imagining how things look from their perspective
012	12FS_R	Green	Becoming extremely involved in a good book or movie is somewhat rare for me. (Reversed)
013	13PD_R	Red	When I see someone get hurt, I tend to remain calm. (Reversed)
014	14EC_R	Purple	Other people’s misfortunes do not usually disturb me a great deal. (Reversed)
015	15PT_R	Yellow	If I’m sure I’m right about something, I don’t waste much time listening to other people’s arguments. (Reversed)
016	16FS	Green	After seeing a play or movie, I have felt as though I were one of the characters.
017	17PD	Red	Being in a tense emotional situation scares me.
018	18EC_R	Purple	When I see someone being treated unfairly, I sometimes don’t feel very much pity for them. (Reversed)
019	19PD_R	Red	I am usually pretty effective in dealing with emergencies. (Reversed)
020	20FS	Green	I am often quite touched by things that I see happen.
021	21PT	Yellow	I believe that there are two sides to every question and try to look at them both.
022	22EC	Purple	I would describe myself as a pretty soft-hearted person.
023	23FS	Green	When I watch a good movie, I can very easily put myself in the place of a leading character.
024	24PD	Red	I tend to lose control during emergencies.
025	25PT	Yellow	When I’m upset at someone, I usually try to “put myself in his shoes” for a while.
026	26FS	Green	When I am reading an interesting story or novel, I imagine how I would feel if the events in the story were happening to me.
027	27PD	Red	When I see someone who badly needs help in an emergency, I go to pieces.
028	28PT	Yellow	Before criticizing somebody, I try to imagine how I would feel if I were in their place

More information about the dataset is contained in empathy_description.html file.

p
Ghana Number Dataset
listtodata.com
jw.listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Ghana Number Dataset [Dataset]. https://listtodata.com/ghana-dataset
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Authors
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Ghana
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Ghana number dataset has accurate numbers attached with verified through our team. These client contact data belong to active users only. In fact, these things make it a valuable marketing resource. Whether your business is new or old, you can boost your reach and connect to a large audience with this database. Again, you will find many people who have an interest in your products and will accept from you. Moreover, the Ghana number dataset will support you make your brand more renowned. In other words, by becoming a known brand in the market, you can increase your brand value greatly. Similarly, many people will show interest in your products and services. However, the contacts on this mobile number list are active and real. Yet, you will benefit greatly if you purchase this cheap but valuable database. Ghana phone data can be a great solution for SMS and telemarketing. Anyone can use the contact lead here to reach different people in this area. Ghana phone data allows you to give product details with your messages to make them more appealing and reliable. Your product quality and content will catch the attention of the interested audience. This will create more traffic and you can reach sales from there. Likewise, the Ghana phone data is an opt-in and permission-based contact list. In addition, with an affordable yet fresh list like ours, your marketing will be more effective. People can now relate to your business more after you successfully use this tool. Thus, order the contact library now from List To Data to promote your goods and services everywhere inside the country. Ghana phone number list is a massive database. Our team promises you sincere service and active support. In general, you can contact us anytime on our website if you face any problems with our list. Our support team will solve the problem for you, thus you don’t have to worry about not obtaining the worth of your money. Further, the Ghana phone number list will aid your business in many new ways. The benefits of marketing on SMS marketing are enormous as we all know very well. Moreover, no one wants to miss out on such a huge and versatile audience in Ghana. Hence, purchasing this contact number package will be a gem for any business any day.
PERU MIGRANT Study | Baseline and 5yr follow-up dataset
figshare.com
datasetcatalog.nlm.nih.gov
bin
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Jaime Miranda; Antonio Bernabe-Ortiz; Rodrigo Carrillo Larco (2023). PERU MIGRANT Study | Baseline and 5yr follow-up dataset [Dataset]. http://doi.org/10.6084/m9.figshare.4832612.v4
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4832612.v4
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
J. Jaime Miranda; Antonio Bernabe-Ortiz; Rodrigo Carrillo Larco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Peru
Description
This is an update of a prior dataset publication containing baseline and 5-year follow-up data from the PERU MIGRANT Study (PEru's Rural to Urban MIGRANTs Study).The PERU MIGRANT Study was designed to investigate the magnitude of differences between rural-to-urban migrant and non-migrant groups in specific cardiovascular risk factors. Three groups were selected: i) Rural, people who have always have lived in a rural environment; ii) Rural-urban, people who migrated from rural to urban areas; and, iii) Urban, people who have always lived in a urban environment.PERU MIGRANT Study protocol, instruments and variables are described in full in:Miranda JJ, Gilman RH, García HH, Smeeth L. The effect on cardiovascular risk factors of migration from rural to urban areas in Peru: PERU MIGRANT Study. BMC Cardiovasc Disord 2009;9:23. PERU MIGRANT Study baseline dataset is available at:https://figshare.com/articles/PERU_MIGRANT_Study_Baseline_dataset/3125005Main findings of the baseline study:Miranda JJ, Gilman RH, Smeeth L. Differences in cardiovascular risk factors in rural, urban and rural-to-urban migrants in Peru. Heart 2011;97(10):787-96. Main findings of the 5-yr follow-up study: Carrillo-Larco RM, Bernabé-Ortiz A, Pillay TD, Gilman RH, Sanchez JF, Poterico JA, Quispe R, Smeeth L, Miranda JJ. Obesity risk in rural, urban and rural-to-urban migrants: prospective results of the PERU MIGRANT study. Int J Obes (Lond) 2016;40(1):181-5. Bernabe-Ortiz A, Sanchez JF, Carrillo-Larco RM, Gilman RH, Poterico JA, Quispe R, Smeeth L, Miranda JJ. Rural-to-urban migration and risk of hypertension: longitudinal results of the PERU MIGRANT study. J Hum Hypertens 2017;31(1):22-28. Lazo-Porras M, Bernabe-Ortiz A, Málaga G, Gilman RH, Acuña-Villaorduña A, Cardenas-Montero D, Smeeth L, Miranda JJ. Low HDL cholesterol as a cardiovascular risk factor in rural, urban, and rural-urban migrants: PERU MIGRANT cohort study. Atherosclerosis 2016;246:36-43.Burroughs Pena MS, Bernabé-Ortiz A, Carrillo-Larco RM, Sánchez JF, Quispe R, Pillay TD, Málaga G, Gilman RH, Smeeth L, Miranda JJ. Migration, urbanisation and mortality: 5-year longitudinal analysis of the PERU MIGRANT study. J Epidemiol Community Health 2015;69(7):715-8.
The dataset contains PII
catalog.data.gov
gimi9.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). The dataset contains PII [Dataset]. https://catalog.data.gov/dataset/the-dataset-contains-pii
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These data are interview transcripts with individuals who are users of the Smoke Sense app. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: This data is available on request to approved individuals. Format: This data contains PII. These are interview transcripts. This dataset is associated with the following publication: Hano, M., L. Wei, B. Hubbell, and A. Rappold. Scaling Up: Citizen Science Engagement and Impacts Beyond the Individual. Citizen Science: Theory and Practice. Ubiquity Press, London, UK, 5(1): 1-13, (2020).
Dataset #2: Experimental study
figshare.com
docx
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Baimel (2023). Dataset #2: Experimental study [Dataset]. http://doi.org/10.6084/m9.figshare.23708766.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23708766.v1
Dataset updated
Jul 19, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Adam Baimel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Project Title: Add title here

Project Team: Add contact information for research project team members

Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14
Statewide Death Profiles
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated Aug 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Statewide Death Profiles [Dataset]. https://data.chhs.ca.gov/dataset/statewide-death-profiles
Explore at:
csv(4689434), csv(16301), csv(5034), csv(463460), csv(2026589), csv(5401561), csv(164006), csv(200270), csv(419332), csv(406971), zipAvailable download formats
Dataset updated
Aug 22, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
COVID-19 Post-Vaccination Infection Data (ARCHIVED)
data.chhs.ca.gov
data.ca.gov
+4more
csv, xlsx, zip
Updated Aug 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). COVID-19 Post-Vaccination Infection Data (ARCHIVED) [Dataset]. https://data.chhs.ca.gov/dataset/covid-19-post-vaccination-infection-data
Explore at:
csv(38212), xlsx(11056), csv(90508), csv(78921), zipAvailable download formats
Dataset updated
Aug 30, 2024
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: This dataset is no longer being updated due to the end of the COVID-19 Public Health Emergency.

The California Department of Public Health (CDPH) is identifying vaccination status of COVID-19 cases, hospitalizations, and deaths by analyzing the state immunization registry and registry of confirmed COVID-19 cases. Post-vaccination cases are individuals who have a positive SARS-Cov-2 molecular test (e.g. PCR) at least 14 days after they have completed their primary vaccination series.

Tracking cases of COVID-19 that occur after vaccination is important for monitoring the impact of immunization campaigns. While COVID-19 vaccines are safe and effective, some cases are still expected in persons who have been vaccinated, as no vaccine is 100% effective. For more information, please see https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/Post-Vaccine-COVID19-Cases.aspx

Post-vaccination infection data is updated monthly and includes data on cases, hospitalizations, and deaths among the unvaccinated and the vaccinated. Partially vaccinated individuals are excluded. To account for reporting and processing delays, there is at least a one-month lag in provided data (for example data published on 9/9/22 will include data through 7/31/22).

Notes:

On September 9, 2022, the post-vaccination data has been changed to compare unvaccinated with those with at least a primary series completed for persons age 5+. These data will be updated monthly (first Thursday of the month) and include at least a one month lag.

On February 2, 2022, the post-vaccination data has been changed to distinguish between vaccination with a primary series only versus vaccinated and boosted. The previous dataset has been uploaded as an archived table. Additionally, the lag on this data has been extended to 14 days.

On November 29, 2021, the denominator for calculating vaccine coverage has been changed from age 16+ to age 12+ to reflect new vaccine eligibility criteria. The previous dataset based on age 16+ denominators has been uploaded as an archived table.
Africa - Population and Internet users statistics
kaggle.com
Updated Dec 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ishmeet singh (2020). Africa - Population and Internet users statistics [Dataset]. https://www.kaggle.com/datasets/ishmeet/africa-population-and-internet-users-statistics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 17, 2020
Dataset provided by
Kaggle
Authors
Ishmeet singh
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Area covered
Africa
Description
Context

Africa - Population and Internet users statistics

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

Source: https://data.humdata.org/dataset/africa-population-and-internet-users-statistics Last updated at https://data.humdata.org/organization/openafrica : 2019-09-11
t
PLACE OF BIRTH - DP02_DES_T - Dataset - CKAN
portal.tad3.org
Updated Nov 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). PLACE OF BIRTH - DP02_DES_T - Dataset - CKAN [Dataset]. https://portal.tad3.org/dataset/place-of-birth-dp02_des_t
Explore at:
Dataset updated
Nov 18, 2024
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES PLACE OF BIRTH - DP02 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 People not reporting a place of birth were assigned the state or country of birth of another family member, or were allocated the response of another individual with similar characteristics. People born outside the United States were asked to report their place of birth according to current international boundaries. Since numerous changes in boundaries of foreign countries have occurred in the last century, some people may have reported their place of birth in terms of boundaries that existed at the time of their birth or emigration, or in accordance with their own national preference.
SceneFake
zenodo.org
zip
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiangyan Yi; Chenglong Wang; Jiangyan Yi; Chenglong Wang (2023). SceneFake [Dataset]. http://doi.org/10.5281/zenodo.7663324
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7663324
Dataset updated
Feb 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiangyan Yi; Chenglong Wang; Jiangyan Yi; Chenglong Wang
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio. These datasets leave out a scenario, in which the acoustic scene of an original audio is manipulated with a forged one. It will pose a major threat to our society if some people misuse the manipulated audio with malicious purpose. Therefore, this motivates us to fill in the gap. This paper proposes such a dataset for scene fake audio detection named SceneFake, where a manipulated audio is generated by only tampering with the acoustic scene of an real utterance by using speech enhancement technologies. The results show that scene fake utterances cannot be detected reliably by the baseline models trained using the ASVspoof 2019 dataset. When the models are trained using the training set of SceneFake, they perform well when evaluated with the seen testing set, but still perform poorly when dealing with the unseen test set.

The SceneFake dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/SceneFake

This data set is licensed with a CC BY-NC-ND 4.0 license.
COVID-19 Vaccine Progress Dashboard Data
data.chhs.ca.gov
data.ca.gov
+5more
csv, xlsx, zip
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data [Dataset]. https://data.chhs.ca.gov/dataset/vaccine-progress-dashboard
Explore at:
csv(82754), csv(675610), csv(2447143), csv(83128924), csv(12877811), csv(26828), csv(724860), csv(303068812), csv(503270), xlsx(11870), csv(110928434), xlsx(11731), csv(6772350), xlsx(11249), csv(148732), zip, csv(7777694), csv(54906), xlsx(7708), csv(2641927), csv(188895), csv(638738), csv(111682), csv(18403068), xlsx(11534)Available download formats
Dataset updated
Sep 1, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

On 6/16/2023 CDPH replaced the booster measures with a new “Up to Date” measure based on CDC’s new recommendations, replacing the primary series, boosted, and bivalent booster metrics The definition of “primary series complete” has not changed and is based on previous recommendations that CDC has since simplified. A person cannot complete their primary series with a single dose of an updated vaccine. Whereas the booster measures were calculated using the eligible population as the denominator, the new up to date measure uses the total estimated population. Please note that the rates for some groups may change since the up to date measure is calculated differently than the previous booster and bivalent measures.

This data is from the same source as the Vaccine Progress Dashboard at https://covid19.ca.gov/vaccination-progress-data/ which summarizes vaccination data at the county level by county of residence. Where county of residence was not reported in a vaccination record, the county of provider that vaccinated the resident is included. This applies to less than 1% of vaccination records. The sum of county-level vaccinations does not equal statewide total vaccinations due to out-of-state residents vaccinated in California.

These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

Totals for the Vaccine Progress Dashboard and this dataset may not match, as the Dashboard totals doses by Report Date and this dataset totals doses by Administration Date. Dose numbers may also change for a particular Administration Date as data is updated.

Previous updates:

On March 3, 2023, with the release of HPI 3.0 in 2022, the previous equity scores have been updated to reflect more recent community survey information. This change represents an improvement to the way CDPH monitors health equity by using the latest and most accurate community data available. The HPI uses a collection of data sources and indicators to calculate a measure of community conditions ranging from the most to the least healthy based on economic, housing, and environmental measures.

Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 16+ and age 5+ denominators have been uploaded as archived tables.

Starting on May 29, 2021 the methodology for calculating on-hand inventory in the shipped/delivered/on-hand dataset has changed. Please see the accompanying data dictionary for details. In addition, this dataset is now down to the ZIP code level.
F
Spanish Open Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/spanish-open-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Spanish Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Spanish language, advancing the field of artificial intelligence.
Dataset Content:
This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Spanish. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Spanish people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Spanish Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
Both the question and answers in Spanish are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Spanish Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.
World cities database
kaggle.com
Updated May 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juanma Hernández (2025). World cities database [Dataset]. http://doi.org/10.34740/kaggle/dsv/11944536
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11944536
Dataset updated
May 25, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Juanma Hernández
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data is from:

https://simplemaps.com/data/world-cities

We're proud to offer a simple, accurate and up-to-date database of the world's cities and towns. We've built it from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA.

Our database is:

Up-to-date: It was last refreshed on May 11, 2025.

Comprehensive: Over 4 million unique cities and towns from every country in the world (about 48 thousand in basic database).

Accurate: Cleaned and aggregated from official sources. Includes latitude and longitude coordinates.

Simple: A single CSV file, concise field names, only one entry per city.
Asthma Prevalence
data.ca.gov
data.chhs.ca.gov
+3more
csv, pdf, zip
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). Asthma Prevalence [Dataset]. https://data.ca.gov/dataset/asthma-prevalence
Explore at:
pdf, csv, zipAvailable download formats
Dataset updated
Aug 28, 2024
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the estimated percentage of Californians with asthma (asthma prevalence). Two types of asthma prevalence are included: 1) lifetime asthma prevalence describes the percentage of people who have ever been diagnosed with asthma by a health care provider, 2) current asthma prevalence describes the percentage of people who have ever been diagnosed with asthma by a health care provider AND report they still have asthma and/or had an asthma episode or attack within the past 12 months. The tables “Lifetime Asthma Prevalence by County” and “Current Asthma Prevalence by County” are derived from the California Health Interview Survey (CHIS) and include data stratified by county and age group (all ages, 0-17, 18+, 0-4, 5-17, 18-64, 65+) reported for 2-year periods. The table “Asthma Prevalence, Adults (18 and older)” is derived from the California Behavioral Risk Factor Surveillance System (BRFSS) and includes statewide data on adults reported by year.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). Mental Health Services Children & Young People [Dataset]. https://www.kaggle.com/datasets/thedevastator/mental-health-services-children-young-people/discussion?sort=undefined

Data from: Mental Health Services Children & Young People

Monthly Statistics on Referrals, Contacts and Care

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 21, 2023

Dataset provided by

Kaggle

Authors

The Devastator

Description

Mental Health Services Children & Young People

Monthly Statistics on Referrals, Contacts and Care

By data.world's Admin [source]

About this dataset

This dataset provides essential information on the mental health services provided to children and young people in England. The data contained within the Mental Health Services Data Set (MHSDS) - Children & Young People covers a variety of different categories during a given reporting period, including primary level details, secondary level descriptions, number of open referrals for children's and young people's mental health services at the end of the reporting period, as well as number of first attended contacts for referrals open in the reporting period aged 0-18. It also provides insight into how many people are in contact with mental health services aged 0 to 18 at the time of reporting, how many referrals starting during this time were self-refreshers and more. This dataset includes valuable information that is necessary to better track and understand trends in order to provide more effective care

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will provide you with an overview of the data contained in this dataset as well as information on how to effectively use it for your own research or personal purposes. Let's get started!

Overview of Data Fields

REPORTING_PERIOD: The month and year of the reporting period (Date)

BREAKDOWN: The type of breakdown of the data (String)

PRIMARY_LEVEL: The primary level of the data (String)

PRIMARY_LEVEL_DESCRIPTION: A description at the primary level of the data (String)

SECONDARY_LEVEL: The secondary level of the data (String)

Research Ideas

Evaluating the efficacy of existing mental health services for children and young people by examining changes in relationships between different aspects of service delivery (e.g. referral activity, hospital spell activity, etc).

Analysing geographical trends in mental health services to inform investment decisions and policies across different regions.

Identifying areas of high need among vulnerable or marginalised citizens, such as those aged 0-18 or those with particular genetic makeup, to better target resources and support those most in need of help

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: mhsds-monthly-cyp-data-file-feb-fin-2017-1.csv | Column name | Description | |:-------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------| | REPORTING_PERIOD | The period of time for which the data was collected. (String) | | BREAKDOWN | The breakdown of the data by age group. (String) | | PRIMARY_LEVEL | The primary level of the data. (String) | | PRIMARY_LEVEL_DESCRIPTION ...

Clear search

Close search

Google apps

Main menu

Data from: Mental Health Services Children & Young People

Mental Health Services Children & Young People

Monthly Statistics on Referrals, Contacts and Care

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Overview of Data Fields

Research Ideas

Acknowledgements

License

Columns

ORBIT: A real-world few-shot dataset for teachable object recognition...

Traffic Crashes - People

Geonames - All Cities with a population > 1000

How Common is Your Birthday?

US Births - How Common is Your Birthday?

How popular is your birthday?

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

2.02 Customer Service (detail)

Empathy dataset

Ghana Number Dataset

PERU MIGRANT Study | Baseline and 5yr follow-up dataset

The dataset contains PII

Dataset #2: Experimental study

Statewide Death Profiles

COVID-19 Post-Vaccination Infection Data (ARCHIVED)

Africa - Population and Internet users statistics

Context

Content

Acknowledgements

PLACE OF BIRTH - DP02_DES_T - Dataset - CKAN

SceneFake

COVID-19 Vaccine Progress Dashboard Data

Spanish Open Ended Question Answer Text Dataset

What’s Included

World cities database

Asthma Prevalence

Data from: Mental Health Services Children & Young People

Monthly Statistics on Referrals, Contacts and Care

Mental Health Services Children & Young People

Monthly Statistics on Referrals, Contacts and Care

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Overview of Data Fields

Research Ideas

Acknowledgements

License

Columns