100+ datasets found

h
Llama-2-SQL-and-Code-Dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Hayduk, Llama-2-SQL-and-Code-Dataset [Dataset]. https://huggingface.co/datasets/ChrisHayduk/Llama-2-SQL-and-Code-Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Chris Hayduk
Description
Dataset Card for "Llama-2-SQL-and-Code-Dataset"

This dataset is intended to provide LLaMA 2 improved coding and instruction following capabilities, with a specific focus on SQL generation. The dataset is in Alpaca Instruct format. Please be sure to provide the instruction and input in the prompt to the model, along with any prompt text you would like to place around those inputs. In the train split, please ignore the table column. The eval split provides example tables so that the… See the full description on the dataset page: https://huggingface.co/datasets/ChrisHayduk/Llama-2-SQL-and-Code-Dataset.
Dataset #2: Experimental study
figshare.com
docx
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Baimel (2023). Dataset #2: Experimental study [Dataset]. http://doi.org/10.6084/m9.figshare.23708766.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23708766.v1
Dataset updated
Jul 19, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Adam Baimel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Project Title: Add title here

Project Team: Add contact information for research project team members

Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14
R
People Data Set 2 Dataset
universe.roboflow.com
zip
Updated Apr 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
landoltring (2024). People Data Set 2 Dataset [Dataset]. https://universe.roboflow.com/landoltring/people-data-set-2
Explore at:
zipAvailable download formats
Dataset updated
Apr 1, 2024
Dataset authored and provided by
landoltring
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
People STRs Bounding Boxes
Description
People Data Set 2

## Overview People Data Set 2 is a dataset for object detection tasks - it contains People STRs annotations for 5,243 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
B
Dataset 2: Interrupted time-series results
borealisdata.ca
Updated Mar 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Global Strategy Lab (2023). Dataset 2: Interrupted time-series results [Dataset]. http://doi.org/10.5683/SP2/PNNQNO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/PNNQNO
Dataset updated
Mar 16, 2023
Dataset provided by
Borealis
Authors
The Global Strategy Lab
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
All results of the primary interrupted time-series results evaluating targeted and total border closures that met the following criteria: 1) at least seven days of data is available before and after the intervention point, 2) for multiple intervention time series, at least seven days has passed since the last intervention point, and 3) for multiple sequential targeted border closures, the second (or third) intervention is observed to indicate an increase of at least 20% of the world’s population being targeted by the new border closures.
H
Dataset: Faces extracted from Time Magazine 1923-2014
dataverse.harvard.edu
marketplace.sshopencloud.eu
Updated Mar 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ana Jofre (2020). Dataset: Faces extracted from Time Magazine 1923-2014 [Dataset]. http://doi.org/10.7910/DVN/JMFQT7
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/JMFQT7
Dataset updated
Mar 18, 2020
Dataset provided by
Harvard Dataverse
Authors
Ana Jofre
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The data presented here consists of three parts: Dataset 1: In this set, we extract 327,322 faces from our entire collection of 3389 issues, and automatically classified each face as male or female. We present this data as a single table with columns identifying the date, issue, page number, the coordinates identifying the position of the face on the page, and classification (male or female). The coordinates identifying the position of the face on the page are based on the size and resolution of the pages found in the “Time Vault”. Dataset 2: Dataset 2 consists of 8,789 classified faces from 100 selected issues. Human labor was used to identify and extract 3,299 face images from 39 issues, which were later classified by another set of workers. This selection of 39 issues contains one issue per decade spanned by the archive plus one issue per year between 1961 and 1991, and the extracted face images were used to train the face extraction algorithm. The remaining 5,490 faces from 61 issues were extracted via machine learning before being classified by human coders. These 61 issues were chosen to complement the first selection of 39 issues: one issue per year for all years in the archive excluding those between 1961 and 1991. Thus, Dataset 2 contains fully-labelled faces from at least one issue per year. Dataset 3: In the interest of transparency, Dataset 3 consists of the raw data collected to create Dataset 2, and consists of 2 tables. Before explaining these tables we first briefly describe our data collection and verification procedures, which have been fully described elsewhere. A custom AMT interface was used to enable human labors to classify faces according the categories in Table 4. Each worker was given a randomly-selected batch of 25 pages, each with a clearly highlighted face to be categorized, of which three pages were verification pages with known features, which were used for quality control. Each face was labeled by two distinct human coders, determined at random so that the paring of coders varied with the image. A proficiency rating was calculated for each coder by considering all images they annotated and computing the average number of labels that matched those identified by the image’s other coder. The tables in Dataset 2 were created by resolving inconsistencies between the two image coders by selecting the labels from the coder with the highest proficiency rating. Prior to calculating the proficiency score, all faces that were tagged as having ‘Poor’ or ‘Error’ image quality by either of the two coders were eliminated. Due to technical bugs when the AMT interface was first implemented, a small number of images were only labeled once; these were also eliminated from Datasets 2 and 3. In Dataset 3, we present the raw annotations for each coder that tagged each face, along with demographic data for each coder. Dataset 3 consists of two tables: the raw data from each of the two sets of coders, and the demographic information for each of the coders.
h
Data from: dataset2
huggingface.co
Updated Mar 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wilson Law (2024). dataset2 [Dataset]. https://huggingface.co/datasets/W1lson/dataset2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2024
Authors
Wilson Law
Description
Dataset Card for "dataset2"

More Information needed
Data from: Gridded data sets II
doi.pangaea.de
bin
Updated Feb 25, 2005
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthias Zabel; Katherina Seiter; Christian Hensen (2005). Gridded data sets II [Dataset]. http://doi.org/10.1594/PANGAEA.251594
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.251594
Dataset updated
Feb 25, 2005
Dataset provided by
PANGAEA
Authors
Matthias Zabel; Katherina Seiter; Christian Hensen
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset is about: Gridded data sets II. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.733691 for more information.
Data from: dataset2
kaggle.com
Updated Apr 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dataset268 (2025). dataset2 [Dataset]. https://www.kaggle.com/datasets/dataset268/dataset2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 18, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
dataset268
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by dataset268

Released under MIT

Contents
P
WikiText-2 Dataset
paperswithcode.com
opendatalab.com
+1more
Updated Sep 27, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher (2016). WikiText-2 Dataset [Dataset]. https://paperswithcode.com/dataset/wikitext-2
Explore at:
Dataset updated
Sep 27, 2016
Authors
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher
Description
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation and numbers - all of which are removed in PTB. As it is composed of full articles, the dataset is well suited for models that can take advantage of long term dependencies.
The Search_2 dataset
figshare.com
zip
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Toet (2016). The Search_2 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.1041463.v6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1041463.v6
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Alexander Toet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Search_2 dataset consists of (1) a set of 44 high-resolution digital color images of different complex natural scenes, (2) the ground truth corresponding to each of these scenes, and (3) the results of psychophysical experiments on each of these images. The images in the Search_2 dataset are a subset of a larger set that was used in a visual search and detection experiment. Each scene (image) contains a single military vehicle that serves as a search target. Areport describes the images in detail, and presents the corresponding ground truth and observer data. The image dataset, an Excel file with the ground truth and observer data, and a copy of this report are included in the dataset. The complete dataset can be used to validate (1) digital metrics that compute the visual distinctness (contrast, conspicuity, or saliency) of targets in complex scenes, and (2) models of human visual search and detection.
o
Data from: The Building Data Genome 2 (BDG2) Data-Set
openenergyhub.ornl.gov
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). The Building Data Genome 2 (BDG2) Data-Set [Dataset]. https://openenergyhub.ornl.gov/explore/dataset/the-building-data-genome-2-bdg2-data-set/
Explore at:
Dataset updated
Jul 26, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Note: Find data at source. ・ BDG2 is an open data set made up of 3,053 energy meters from 1,636 buildings. The time range of the times-series data is the two full years (2016 and 2017) and the frequency is hourly measurements of electricity, heating and cooling water, steam, and irrigation meters.
h
Data from: dataset2
huggingface.co
Updated Jun 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ignacio (2024). dataset2 [Dataset]. https://huggingface.co/datasets/nachors/dataset2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2024
Authors
Ignacio
Description
nachors/dataset2 dataset hosted on Hugging Face and contributed by the HF Datasets community
i
EIT Data (set 2)
ieee-dataport.org
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mayank goswami (2024). EIT Data (set 2) [Dataset]. https://ieee-dataport.org/documents/eit-data-set-2
Explore at:
Dataset updated
Oct 15, 2024
Authors
mayank goswami
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
industrial CT are used for industrial applciations. Computational Fluid dynamics assist flow simulation of fluid channels.
o
Data from: Green urban areas
data.opendatascience.eu
Updated Jan 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Green urban areas [Dataset]. https://data.opendatascience.eu/geonetwork/srv/search?type=dataset
Explore at:
Dataset updated
Jan 2, 2021
Description
141: Areas with vegetation within or partly embraced by urban fabric. This class is assigned for urban greenery, which usually has recreational or ornamental character and is usually accessible for the public.
h
SHP-2
huggingface.co
Updated Jan 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2024). SHP-2 [Dataset]. https://huggingface.co/datasets/stanfordnlp/SHP-2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2024
Dataset authored and provided by
Stanford NLP
Description
🚢 Stanford Human Preferences Dataset v2 (SHP-2)

Summary

SHP-2 is a dataset of 4.8M collective human preferences over responses to questions/instructions in 129 different subject areas, from cooking to legal advice. It is an extended version of the original 385K SHP dataset. The preferences are meant to reflect the helpfulness of one response over another, and are intended to be used for training RLHF reward models and NLG evaluation models (e.g., SteamSHP). Each example… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/SHP-2.
Z
Dataset: A Systematic Literature Review on the topic of High-value datasets
data.niaid.nih.gov
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nina Rizun (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
Explore at:
Dataset updated
Jun 23, 2023
Dataset provided by
Anastasija Nikiforova
Andrea Miletič
Nina Rizun
Magdalena Ciesielska
Charalampos Alexopoulos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt
Data from: dataset2
kaggle.com
Updated Dec 13, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roxana (2020). dataset2 [Dataset]. https://www.kaggle.com/datasets/roxana222/dataset2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 13, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Roxana
Description
Dataset

This dataset was created by Roxana

Contents
o
Sport and leisure facilities
data.opendatascience.eu
Updated Jan 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Sport and leisure facilities [Dataset]. https://data.opendatascience.eu/geonetwork/srv/search?type=dataset
Explore at:
Dataset updated
Jan 2, 2021
Description
Overview: 142: Areas used for sports, leisure and recreation purposes. Traceability (lineage): This dataset was produced with a machine learning framework with several input datasets, specified in detail in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ) Scientific methodology: The single-class probability layers were generated with a spatiotemporal ensemble machine learning framework detailed in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ). The single-class uncertainty layers were calculated by taking the standard deviation of the three single-class probabilities predicted by the three components of the ensemble. The HCL (hard class) layers represents the class with the highest probability as predicted by the ensemble. Usability: The HCL layers have a decreasing average accuracy (weighted F1-score) at each subsequent level in the CLC hierarchy. These metrics are 0.83 at level 1 (5 classes):, 0.63 at level 2 (14 classes), and 0.49 at level 3 (43 classes). This means that the hard-class maps are more reliable when aggregating classes to a higher level in the hierarchy (e.g. 'Discontinuous Urban Fabric' and 'Continuous Urban Fabric' to 'Urban Fabric'). Some single-class probabilities may more closely represent actual patterns for some classes that were overshadowed by unequal sample point distributions. Users are encouraged to set their own thresholds when postprocessing these datasets to optimize the accuracy for their specific use case. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: The LULC classification was validated through spatial 5-fold cross-validation as detailed in the accompanying publication. Completeness: The dataset has chunks of empty predictions in regions with complex coast lines (e.g. the Zeeland province in the Netherlands and the Mar da Palha bay area in Portugal). These are artifacts that will be avoided in subsequent versions of the LULC product. Consistency: The accuracy of the predictions was compared per year and per 30km*30km tile across europe to derive temporal and spatial consistency by calculating the standard deviation. The standard deviation of annual weighted F1-score was 0.135, while the standard deviation of weighted F1-score per tile was 0.150. This means the dataset is more consistent through time than through space: Predictions are notably less accurate along the Mediterrranean coast. The accompanying publication contains additional information and visualisations. Positional accuracy: The raster layers have a resolution of 30m, identical to that of the Landsat data cube used as input features for the machine learning framework that predicted it. Temporal accuracy: The dataset contains predictions and uncertainty layers for each year between 2000 and 2019. Thematic accuracy: The maps reproduce the Corine Land Cover classification system, a hierarchical legend that consists of 5 classes at the highest level, 14 classes at the second level, and 44 classes at the third level. Class 523: Oceans was omitted due to computational constraints.
R
People 2 Dataset
universe.roboflow.com
zip
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University RUT MIIT (2023). People 2 Dataset [Dataset]. https://universe.roboflow.com/university-rut-miit/people-2-1r5su
Explore at:
zipAvailable download formats
Dataset updated
Dec 7, 2023
Dataset authored and provided by
University RUT MIIT
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
People Y9S6 Bounding Boxes
Description
People 2

## Overview People 2 is a dataset for object detection tasks - it contains People Y9S6 annotations for 479 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
i
Univariate time series data sets
ieee-dataport.org
Updated Oct 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuhua Su (2024). Univariate time series data sets [Dataset]. https://ieee-dataport.org/documents/univariate-time-series-data-sets
Explore at:
Dataset updated
Oct 13, 2024
Authors
Shuhua Su
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset package includes four datasets

Facebook

Twitter

Click to copy link

Link copied

Cite

Chris Hayduk, Llama-2-SQL-and-Code-Dataset [Dataset]. https://huggingface.co/datasets/ChrisHayduk/Llama-2-SQL-and-Code-Dataset

Llama-2-SQL-and-Code-Dataset

ChrisHayduk/Llama-2-SQL-and-Code-Dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Authors

Chris Hayduk

Description

Dataset Card for "Llama-2-SQL-and-Code-Dataset"

This dataset is intended to provide LLaMA 2 improved coding and instruction following capabilities, with a specific focus on SQL generation. The dataset is in Alpaca Instruct format. Please be sure to provide the instruction and input in the prompt to the model, along with any prompt text you would like to place around those inputs. In the train split, please ignore the table column. The eval split provides example tables so that the… See the full description on the dataset page: https://huggingface.co/datasets/ChrisHayduk/Llama-2-SQL-and-Code-Dataset.

Clear search

Close search

Google apps

Main menu

Llama-2-SQL-and-Code-Dataset

Dataset #2: Experimental study

People Data Set 2 Dataset

People Data Set 2

Dataset 2: Interrupted time-series results

Dataset: Faces extracted from Time Magazine 1923-2014

Data from: dataset2

Data from: Gridded data sets II

Data from: dataset2

Dataset

Contents

WikiText-2 Dataset

The Search_2 dataset

Data from: The Building Data Genome 2 (BDG2) Data-Set

Data from: dataset2

EIT Data (set 2)

Data from: Green urban areas

SHP-2

Dataset: A Systematic Literature Review on the topic of High-value datasets

Data from: dataset2

Dataset

Contents

Sport and leisure facilities

People 2 Dataset

People 2

Univariate time series data sets

Llama-2-SQL-and-Code-Dataset

ChrisHayduk/Llama-2-SQL-and-Code-Dataset