Dataset Card for instruction-dataset-mini-with-generations
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/vonewman/instruction-dataset-mini-with-generations/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info… See the full description on the dataset page: https://huggingface.co/datasets/vonewman/instruction-dataset-mini-with-generations.
https://www.icpsr.umich.edu/web/ICPSR/studies/38421/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38421/terms
This collection includes a combined dataset of the Generations study wave 1 (baseline) survey and the TransPop study transgender survey. The two studies have many overlapping variables, and they examined topics such as respondents' health outcomes and behaviors, experiences with discrimination, identity, and transition-related experiences. Data from these studies were merged to allow for analysis of the combined LGBT populations. This dataset has also been reweighted to be representative of these populations. The complete Generations study data (baseline, wave 2, and wave 3 survey data) can be found under study number 37166, and the complete TransPop study data (transgender and cisgender survey data) can be found under study number 37938. For detailed information on the Generations and TransPop studies, including related publications, please refer to their respective DSDR/ICPSR study pages.
These data are part of NACJD's Fast Track Release and are distributed as they there received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except of the removal of direct identifiers. Users should refer to the accompany readme file for a brief description of the files available with this collections and consult the investigator(s) if further information is needed. This study consists of a secondary analysis of data from the Educational Longitudinal Study of 2002 (ELS) to investigate associations between immigration, misbehavior, victimization, disorder, and educational failure (i.e., dropping out). Six research questions that were addressed in this study include: do school social bonds vary across immigration generations? Second, is student violence (i.e., misbehavior and victimization) explained by school social bonds across generations? Third, are student violence and school disorder related to the children immigrants' likelihood of dropping out? Fourth, are strong school social bonds mitigating the likelihood of dropping out for the children of immigrants? Fifth, are immigrant school enclaves associated with increased school social bonds among adolescents, decreased student violence and school disorder, and lower levels of dropping out? Sixth, does the intersection of race, ethnicity, and gender moderate the relationship between student violence and school social bonds for the children of immigrants?There are no data files available with this study. Only the syntax file used by the researcher is provided.
According to the latest data gathered in the United States in 2024, teens and young adults spent most of their audio listening time with streaming music, that is, ** percent. Streaming music videos on YouTube is also a popular choice, with ** percent of audio time spent on the platform. AM/FM Radio closely followed with a share of ** percent of Gen Z audio time.
Dataset Details
This dataset contains a rich collection of popular slang terms and acronyms used primarily by Generation Z. It includes detailed descriptions of each term, its context of use, and practical examples that demonstrate how the slang is used in real-life conversations. The dataset is designed to capture the unique and evolving language patterns of GenZ, reflecting their communication style in digital spaces such as social media, text messaging, and online forums. Each… See the full description on the dataset page: https://huggingface.co/datasets/MLBtrio/genz-slang-dataset.
In 2023, the disposable income of a household led by a Millennial in the United States was 97,866 U.S. dollars per year. Households led by someone born in Generation X, however, had a disposable income of around 113,886 U.S. dollars in 2023.
According to data gathered in the United States in March 2023, Pop was the most popular genre for Generation Z. ** percent of Gen Z respondents included the genre to be among their favorites. Rap or Hip-Hop was second, being mentioned by a share of ** percent, while Rock concludes the top three, reaching ** percent.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for [GPT4All-J Prompt Generations]
Dataset Description
Dataset used to train GPT4All-J and GPT4All-J-LoRA We release several versions of datasets
v1.0: The original dataset we used to finetune GPT-J on v1.1-breezy: A filtered dataset where we removed all instances of AI language model v1.2-jazzy: A filtered dataset where we also removed instances like I'm sorry, I can't answer... and AI language model v1.3-groovy: The v1.2 dataset with ShareGPT and Dolly… See the full description on the dataset page: https://huggingface.co/datasets/nomic-ai/gpt4all-j-prompt-generations.
tdh87/AI-GEN dataset hosted on Hugging Face and contributed by the HF Datasets community
deepseek-r1-qwen-7b generations for deepscaler dataset
The original deepscaler dataset has been filtered:
we removed all synthetic data because their problem-answer may not match. based on generations from qwen-7b (pre-o1), we removed problems that has 5/32 correct generations.
We then use deepseek-r1-qwen-7b to generate from this filtered dataset with num_generations=32.
We keep generations that finish. This translates to generations that have the second boxed.… See the full description on the dataset page: https://huggingface.co/datasets/drproduck/deepscaler-hard-r1-qwen7b-n32.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Dataset Card for instruction-dataset-mini-with-generations
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/vonewman/instruction-dataset-mini-with-generations/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info… See the full description on the dataset page: https://huggingface.co/datasets/vonewman/instruction-dataset-mini-with-generations.