100+ datasets found

Dataset #1: Cross-sectional survey data
figshare.com
txt
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Baimel (2023). Dataset #1: Cross-sectional survey data [Dataset]. http://doi.org/10.6084/m9.figshare.23708730.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23708730.v1
Dataset updated
Jul 19, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Adam Baimel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
N.B. This is not real data. Only here for an example for project templates.

Project Title: Add title here

Project Team: Add contact information for research project team members

Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14
w
Data Use in Academia Dataset
datacatalog.worldbank.org
csv, utf-8
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semantic Scholar Open Research Corpus (S2ORC) (2023). Data Use in Academia Dataset [Dataset]. https://datacatalog.worldbank.org/search/dataset/0065200/data_use_in_academia_dataset
Explore at:
utf-8, csvAvailable download formats
Dataset updated
Nov 27, 2023
Dataset provided by
Semantic Scholar Open Research Corpus (S2ORC)
Brian William Stacy
License
https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
Description
This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.

Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.

We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.

Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.

The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.

To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.

The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.

The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:

Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.
There are two classification tasks in this exercise:
1. identifying whether an academic article is using data from any country
2. Identifying from which country that data came.
For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.
After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]
For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.
We expect between 10 and 35 percent of all articles to use data.

The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.

A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.

The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.

The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of
Job Dataset
kaggle.com
zip
Updated Sep 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravender Singh Rana (2023). Job Dataset [Dataset]. https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset
Explore at:
zip(479575920 bytes)Available download formats
Dataset updated
Sep 17, 2023
Authors
Ravender Singh Rana
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Job Dataset

This dataset provides a comprehensive collection of synthetic job postings to facilitate research and analysis in the field of job market trends, natural language processing (NLP), and machine learning. Created for educational and research purposes, this dataset offers a diverse set of job listings across various industries and job types.

Descriptions for each of the columns in the dataset:

Job Id: A unique identifier for each job posting.

Experience: The required or preferred years of experience for the job.

Qualifications: The educational qualifications needed for the job.

Salary Range: The range of salaries or compensation offered for the position.

Location: The city or area where the job is located.

Country: The country where the job is located.

Latitude: The latitude coordinate of the job location.

Longitude: The longitude coordinate of the job location.

Work Type: The type of employment (e.g., full-time, part-time, contract).

Company Size: The approximate size or scale of the hiring company.

Job Posting Date: The date when the job posting was made public.

Preference: Special preferences or requirements for applicants (e.g., Only Male or Only Female, or Both)

Contact Person: The name of the contact person or recruiter for the job.

Contact: Contact information for job inquiries.

Job Title: The job title or position being advertised.

Role: The role or category of the job (e.g., software developer, marketing manager).

Job Portal: The platform or website where the job was posted.

Job Description: A detailed description of the job responsibilities and requirements.

Benefits: Information about benefits offered with the job (e.g., health insurance, retirement plans).

Skills: The skills or qualifications required for the job.

Responsibilities: Specific responsibilities and duties associated with the job.

Company Name: The name of the hiring company.

Company Profile: A brief overview of the company's background and mission.

Potential Use Cases:

Building predictive models to forecast job market trends.

Enhancing job recommendation systems for job seekers.

Developing NLP models for resume parsing and job matching.

Analyzing regional job market disparities and opportunities.

Exploring salary prediction models for various job roles.

Acknowledgements:

We would like to express our gratitude to the Python Faker library for its invaluable contribution to the dataset generation process. Additionally, we appreciate the guidance provided by ChatGPT in fine-tuning the dataset, ensuring its quality, and adhering to ethical standards.

Note:

Please note that the examples provided are fictional and for illustrative purposes. You can tailor the descriptions and examples to match the specifics of your dataset. It is not suitable for real-world applications and should only be used within the scope of research and experimentation. You can also reach me via email at: rrana157@gmail.com
m
sample
data.mendeley.com
Updated Feb 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kaavya kaavya (2024). sample [Dataset]. http://doi.org/10.17632/ft7ctmb7yh.1
Explore at:
Unique identifier
https://doi.org/10.17632/ft7ctmb7yh.1
Dataset updated
Feb 5, 2024
Authors
kaavya kaavya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Describe your research hypothesis, what your data shows, any notable findings and how the data can be interpreted. Please add sufficient description to enable others to understand what the data is, how it was gathered and how to interpret and use it.

Massive Scholarly Dataset: 5M Papers 36M Citations

kaggle.com

zip

Updated Mar 16, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Agung Pambudi (2024). Massive Scholarly Dataset: 5M Papers 36M Citations [Dataset]. https://www.kaggle.com/datasets/agungpambudi/research-citation-network-5m-papers

Explore at:

zip(7308773733 bytes)Available download formats

Dataset updated

Mar 16, 2024

Authors

Agung Pambudi

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Explore a rich research dataset with 5.2M papers and 36.6M citations! Unleash your data science skills for clustering, influence analysis, topic modeling, and more. Dive into the world of research networks.

Field Name	Field Type	Description	Example
id	string	paper ID	53e997ddb7602d9701fd3ad7
title	string	paper title	Rewrite-Based Satisfiability Procedures for Recursive Data Structures
authors.name	string	author name	Maria Paola Bonacina
author.org	string	author affiliation	Dipartimento di Informatica
author.id	string	author ID	53f47275dabfaee43ed25965
venue.raw	string	paper venue name	Electronic Notes in Theoretical Computer Science(ENTCS)
year	int	published year	2007
keywords	list of strings	keywords	["theorem-proving strategy", "rewrite-based approach", ...]
fos.name	string	paper fields of study	Data structure
fos.w	float	fields of study weight	0.48341
references	list of strings	paper references	["53e9a31fb7602d9702c2c61e", "53e997f1b7602d9701fef4d1", ...]
n_citation	int	citation number	19
page_start	string	page start	55
page_end	string	page end	70
doc_type	string	paper type: journal, conference	Journal
lang	string	detected language	en
volume	string	volume	174
issue	string	issue	8
issn	string	issn	Electronic Notes in Theoretical Computer Science
isbn	string	isbn
doi	string	doi	10.1016/j.entcs.2006.11.039
url	list	external links	[https: ...]
abstract	string	abstract	Our ability to generate ...
indexed_abstract	dict	indexed abstract	{"IndexLength": 116, "InvertedIndex": {"data": [49], ...}
v12_id	int	v12 paper id	2027211529
v12_authors.name	string	v12 author name	Maria Paola Bonacina
v12_authors.org	string	v12 author affiliation	Dipartimento di Informatica,Università degli Studi di Verona,Italy#TAB#
v12_authors.id	int	v12 author ID	669130765

Ex-R Study Urine Data
catalog.data.gov
Updated Jan 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). Ex-R Study Urine Data [Dataset]. https://catalog.data.gov/dataset/ex-r-study-urine-data
Explore at:
Dataset updated
Jan 24, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Emory University (analyzed the urine samples for pyrethroid metabolites). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Contact Researcher. Format: Pyrethroid metabolite concentration data for 50 adults over six-weeks. This dataset is associated with the following publication: Morgan , M., J. Sobus , D.B. Barr, C. Croghan , F. Chen , R. Walker, L. Alston, E. Andersen, and M. Clifton. Temporal variability of pyrethroid metabolite levels in bedtime, morning, and 24-hr urine samples for 50 adults in North Carolina. ENVIRONMENT INTERNATIONAL. Elsevier Science Ltd, New York, NY, USA, 144: 81-91, (2015).
B
Data Management Plan Examples Database
borealisdata.ca
search.dataone.org
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebeca Gaston Jothyraj; Shrey Acharya; Isaac Pratt; Danica Evering; Sarthak Behal (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/SDITUG
Dataset updated
Aug 27, 2024
Dataset provided by
Borealis
Authors
Rebeca Gaston Jothyraj; Shrey Acharya; Isaac Pratt; Danica Evering; Sarthak Behal
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
2011 - 2024
Description
This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined in the README. Data included/extracted from the examples included the discipline and field of study, author, institutional affiliation and funding information, location, date modified, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications, grant pages, or French language versions. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
Meta data and supporting documentation
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
o
University SET data, with faculty and courses characteristics
openicpsr.org
Updated Sep 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
Explore at:
Unique identifier
https://doi.org/10.3886/E149801V1
Dataset updated
Sep 12, 2021
Authors
Under blind review in refereed journal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
Population Health (BRFSS: HRQOL)
kaggle.com
zip
Updated Dec 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Population Health (BRFSS: HRQOL) [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlock-population-health-needs-with-brfss-hrqol
Explore at:
zip(2247473 bytes)Available download formats
Dataset updated
Dec 14, 2022
Authors
The Devastator
Description
Population Health (BRFSS: HRQOL)

Examining Trends, Disparities and Determinants of Health in the US Population

By Health [source]

About this dataset

The Behavioral Risk Factor Surveillance System (BRFSS) offers an expansive collection of data on the health-related quality of life (HRQOL) from 1993 to 2010. Over this time period, the Health-Related Quality of Life dataset consists of a comprehensive survey reflecting the health and well-being of non-institutionalized US adults aged 18 years or older. The data collected can help track and identify unmet population health needs, recognize trends, identify disparities in healthcare, determine determinants of public health, inform decision making and policy development, as well as evaluate programs within public healthcare services.

The HRQOL surveillance system has developed a compact set of HRQOL measures such as a summary measure indicating unhealthy days which have been validated for population health surveillance purposes and have been widely implemented in practice since 1993. Within this study's dataset you will be able to access information such as year recorded, location abbreviations & descriptions, category & topic overviews, questions asked in surveys and much more detailed information including types & units regarding data values retrieved from respondents along with their sample sizes & geographical locations involved!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset tracks the Health-Related Quality of Life (HRQOL) from 1993 to 2010 using data from the Behavioral Risk Factor Surveillance System (BRFSS). This dataset includes information on the year, location abbreviation, location description, type and unit of data value, sample size, category and topic of survey questions.

Using this dataset on BRFSS: HRQOL data between 1993-2010 will allow for a variety of analyses related to population health needs. The compact set of HRQOL measures can be used to identify trends in population health needs as well as determine disparities among various locations. Additionally, responses to survey questions can be used to inform decision making and program and policy development in public health initiatives.

Research Ideas

Analyzing trends in HRQOL over the years by location to identify disparities in health outcomes between different populations and develop targeted policy interventions.

Developing new models for predicting HRQOL indicators at a regional level, and using this information to inform medical practice and public health implementation efforts.

Using the data to understand differences between states in terms of their HRQOL scores and establish best practices for healthcare provision based on that understanding, including areas such as access to care, preventative care services availability, etc

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: rows.csv | Column name | Description | |:-------------------------------|:----------------------------------------------------------| | Year | Year of survey. (Integer) | | LocationAbbr | Abbreviation of location. (String) | | LocationDesc | Description of location. (String) | | Category | Category of survey. (String) | | Topic | Topic of survey. (String) | | Question | Question asked in survey. (String) | | DataSource | Source of data. (String) | | Data_Value_Unit | Unit of data value. (String) | | Data_Value_Type | Type of data value. (String) | | Data_Value_Footnote_Symbol | Footnote symbol for data value. (String) | | Data_Value_Std_Err | Standard error of the data value. (Float) | | Sample_Size | Sample size used in sample. (Integer) | | Break_Out | Break out categories used. (String) | | Break_Out_Category | Type break out assessed. (String) | | **GeoLocation*...
ToolRM-train-data
huggingface.co
Updated Nov 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM Research (2025). ToolRM-train-data [Dataset]. https://huggingface.co/datasets/ibm-research/ToolRM-train-data
Explore at:
Dataset updated
Nov 1, 2025
Dataset provided by
IBMhttp://ibm.com/
IBM Research
Authors
IBM Research
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
ToolRM Training Dataset

📖 Dataset Description

This is a version of the training data utilized for ToolRM, a collection of outcome reward models specifically designed for evaluating and improving function-calling capabilities in large language models. It consists of ~459K examples, where each example includes a user-assistant conversation, available tool specifications, and a pair of correct and incorrect tool calls. The incorrect calls were generated by prompting 9… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/ToolRM-train-data.
Data supporting the Master thesis "Monitoring von Open Data Praktiken -...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14196539
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Zinke; Katharina Zinke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Dresden
Description
Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)
Examples of boilerplate text from PLOS ONE papers based on targeted n-gram...
plos.figshare.com
datasetcatalog.nlm.nih.gov
+1more
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicole M. White; Thirunavukarasu Balasubramaniam; Richi Nayak; Adrian G. Barnett (2023). Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level). [Dataset]. http://doi.org/10.1371/journal.pone.0264360.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0264360.t001
Dataset updated
Jun 14, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Nicole M. White; Thirunavukarasu Balasubramaniam; Richi Nayak; Adrian G. Barnett
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level).
N
Marin County, CA Age Group Population Dataset: A complete breakdown of Marin...
neilsberg.com
csv, json
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2023). Marin County, CA Age Group Population Dataset: A complete breakdown of Marin County age demographics from 0 to 85 years, distributed across 18 age groups [Dataset]. https://www.neilsberg.com/research/datasets/70b4fc92-3d85-11ee-9abe-0aa64bf2eeb2/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Sep 16, 2023
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Marin County, California
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Marin County population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Marin County. The dataset can be utilized to understand the population distribution of Marin County by age. For example, using this dataset, we can identify the largest age group in Marin County.

Key observations

The largest age group in Marin County, CA was for the group of age 55-59 years with a population of 21,251 (8.10%), according to the 2021 American Community Survey. At the same time, the smallest age group in Marin County, CA was the 85+ years with a population of 7,092 (2.70%). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the Marin County is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of Marin County total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Marin County Population by Age. You can refer the same here
s
Analysis of CBCS publications for Open Access, data availability statements...
figshare.scilifelab.se
researchdata.se
+2more
txt
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Theresa Kieselbach (2025). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data [Dataset]. http://doi.org/10.17044/scilifelab.23641749.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.17044/scilifelab.23641749.v1
Dataset updated
Jan 15, 2025
Dataset provided by
Umeå University
Authors
Theresa Kieselbach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General descriptionThis dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.1. Is the research article an Open Access publication?2. Does the research article have a Creative Common license or a similar license?3. Does the research article contain a data availability statement?4. Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?5. Does the research article contain supplementary data?6. Do the supplementary data have a persistent identifier that makes them citable as a defined research output?VariablesThe data were compiled in a Microsoft Excel 365 document that includes the following variables.1. DOI URL of research article2. Year of publication3. Research article published with Open Access4. License for research article5. Data availability statement in article6. Supplementary data added to article7. Persistent identifier for supplementary data8. Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDCVisualizationParts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).File formats and softwareThe file formats used in this dataset are:.csv (Text file).docx (Microsoft Word 365 file).jpg (JPEG image file).pdf/A (Portable Document Format for archiving).png (Portable Network Graphics image file).pptx (Microsoft Power Point 365 file).txt (Text file).xlsx (Microsoft Excel 365 file)All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016. MD5 checksumsHere is a list of all files of this dataset and of their MD5 checksums.1. Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)2. Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)3. Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),4. Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),5. Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),6. CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),7. CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),8. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),9. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),10. Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),11. Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),12. Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),13. Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),14. Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),15. Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),16. Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),17. Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),18. Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),19. Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)
d
Data from: tableone: An open source Python package for producing summary...
datadryad.org
search.dataone.org
+1more
zip
Updated Apr 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark (2019). tableone: An open source Python package for producing summary statistics for research papers [Dataset]. http://doi.org/10.5061/dryad.26c4s35
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.26c4s35
Dataset updated
Apr 23, 2019
Dataset provided by
Dryad
Authors
Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark
Time period covered
Apr 19, 2018
Description
Objectives: In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table (“Table 1”) of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.

Materials and Methods: The tableone package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged.

Results: The tableone software package automatically compiles summary statistics into publishable formats such...
H
Political Analysis Using R: Example Code and Data, Plus Data for Practice...
dataverse.harvard.edu
search.dataone.org
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jamie Monogan (2020). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ARKOTI
Dataset updated
Apr 28, 2020
Dataset provided by
Harvard Dataverse
Authors
Jamie Monogan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
Data from: Delta Produce Sources Study
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Delta Produce Sources Study [Dataset]. https://catalog.data.gov/dataset/delta-produce-sources-study-51a7a
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The Delta Produce Sources Study was an observational study designed to measure and compare food environments of farmers markets (n=3) and grocery stores (n=12) in 5 rural towns located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys from June 2019 to March 2020 using a modified version of the Nutrition Environment Measures Survey (NEMS) Farmers Market Audit tool. The tool was modified to collect information pertaining to source of fresh produce and also for use with both farmers markets and grocery stores. Availability, source, quality, and price information were collected and compared between farmers markets and grocery stores for 13 fresh fruits and 32 fresh vegetables via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Resources in this dataset:Resource Title: Delta Produce Sources Study dataset . File Name: DPS Data Public.csvResource Description: The dataset contains variables corresponding to availability, source (country, state and town if country is the United States), quality, and price (by weight or volume) of 13 fresh fruits and 32 fresh vegetables sold in farmers markets and grocery stores located in 5 Lower Mississippi Delta towns.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Delta Produce Sources Study data dictionary. File Name: DPS Data Dictionary Public.csvResource Description: This file is the data dictionary corresponding to the Delta Produce Sources Study dataset.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
m
SPHERE: Students' performance dataset of conceptual understanding,...
data.mendeley.com
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2
Explore at:
Unique identifier
https://doi.org/10.17632/88d7m2fv7p.2
Dataset updated
Jan 15, 2025
Authors
Purwoko Haryadi Santoso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.
d
Job Postings Dataset for Labour Market Research and Insights
datarade.ai
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2023). Job Postings Dataset for Labour Market Research and Insights [Dataset]. https://datarade.ai/data-products/job-postings-dataset-for-labour-market-research-and-insights-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 20, 2023
Dataset authored and provided by
Oxylabs
Area covered
Anguilla, British Indian Ocean Territory, Jamaica, Togo, Luxembourg, Switzerland, Tajikistan, Kyrgyzstan, Sierra Leone, Zambia
Description
Introducing Job Posting Datasets: Uncover labor market insights!

Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.

Job Posting Datasets Source:

Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.

Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.

StackShare: Access StackShare datasets to make data-driven technology decisions.

Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.

Choose your preferred dataset delivery options for convenience:

Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.

Why Choose Oxylabs Job Posting Datasets:

Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.

Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.

Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.

Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.

Facebook

Twitter

Click to copy link

Link copied

Cite

Adam Baimel (2023). Dataset #1: Cross-sectional survey data [Dataset]. http://doi.org/10.6084/m9.figshare.23708730.v1

Dataset #1: Cross-sectional survey data

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.23708730.v1

Dataset updated

Jul 19, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Adam Baimel

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

N.B. This is not real data. Only here for an example for project templates.

Project Title: Add title here

Project Team: Add contact information for research project team members

Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14

Clear search

Close search

Google apps

Main menu

Dataset #1: Cross-sectional survey data

Data Use in Academia Dataset

Job Dataset

Job Dataset

Descriptions for each of the columns in the dataset:

Potential Use Cases:

Acknowledgements:

Note:

sample

Massive Scholarly Dataset: 5M Papers 36M Citations

Ex-R Study Urine Data

Data Management Plan Examples Database

Meta data and supporting documentation

University SET data, with faculty and courses characteristics

Population Health (BRFSS: HRQOL)

Population Health (BRFSS: HRQOL)

Examining Trends, Disparities and Determinants of Health in the US Population

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

ToolRM-train-data

Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

Examples of boilerplate text from PLOS ONE papers based on targeted n-gram...

Marin County, CA Age Group Population Dataset: A complete breakdown of Marin...

About this dataset

Content

Inspiration

Recommended for further research

Analysis of CBCS publications for Open Access, data availability statements...

Data from: tableone: An open source Python package for producing summary...

Political Analysis Using R: Example Code and Data, Plus Data for Practice...

Data from: Delta Produce Sources Study

SPHERE: Students' performance dataset of conceptual understanding,...

Job Postings Dataset for Labour Market Research and Insights

Dataset #1: Cross-sectional survey data