Facebook
TwitterGuidelines
These guidelines are based on the paper Training Language Models to Follow Instructions with Human Feedback You are given a text-based description of a task, submitted by a user. This task description may be in the form of an explicit instruction (e.g. "Write a story about a wise frog."). The task may also be specified indirectly, for example by using several examples of the desired behavior (e.g. given a sequence of movie reviews followed by their sentiment, followed by… See the full description on the dataset page: https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon.
Facebook
TwitterComparison of selected metrics across R packages using example dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comparison of various Large Language Models (LLMs) based on their performance, cost, and efficiency. It includes important details like speed, latency, benchmarks, and pricing, helping users understand how different models stack up against each other.
Here are some of the key metrics included in the dataset:
This dataset is useful for researchers, developers, and AI enthusiasts who want to compare LLMs and choose the best one based on their needs.
📌If you find this dataset useful, do give an upvote :)
Facebook
TwitterDataset Card for example-dataset-2
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/djackson-proofpoint/example-dataset-2/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/djackson-proofpoint/example-dataset-2.
Facebook
TwitterThis research effort is a modeling study using the HYDRUS (2D/3D) computer program (www.pc-progress.com) and described in the manuscript/journal article entitled “Comparison of recharge from drywells and infiltration basins: a modeling study.” All the tables and figures in the journal article will be documented within an Excel spreadsheet that will include worksheet tabs with data associated with each table and figure. The tabs, columns, and rows will be clearly labeled to identify table/figures, variables, and units. The information supporting the model runs will be supported in the example library of HYDRUS (2D/3D) maintained by PC-Progress. Non-standard HYDRUS subroutines for the drywell and for the infiltration pond simulations that were funded by this research will be added and made available for viewing and download. After the 1 year embargo period the site will include a link to the PubMed Central manuscript. For example, the HYDRUS library for the transient head drywell associated with the Sasidharan et al. (2018) paper is now active (https://www.pcprogress.com/en/Default.aspx?h3d2-lib-Drywell ). This dataset is associated with the following publication: Sasidharan, S., S. Bradford, J. Simunek, and S. Kraemer. Comparison of recharge from drywells and infiltration basins: A modeling study. JOURNAL OF HYDROLOGY. Elsevier Science Ltd, New York, NY, USA, 594: 125720, (2021).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset provides a detailed look into the world of competitive video gaming in universities. It covers a wide range of topics, from performance rankings and results across multiple esports platforms to the individual team and university rankings within each tournament. With an incredible wealth of data, fans can discover statistics on their favorite teams or explore the challenges placed upon university gamers as they battle it out to be the best. Dive into the information provided and get an inside view into the world of collegiate esports tournaments as you assess all things from Match ID, Team 1, University affiliations, Points earned or lost in each match and special Seeds or UniSeeds for exceptional teams. Of course don't forget about exploring all the great Team Names along with their corresponding websites for further details on stats across tournaments!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Download Files First, make sure you have downloaded the CS_week1, CS_week2, CS_week3 and seeds datasets on Kaggle. You will also need to download the currentRankings file for each week of competition. All files should be saved using their originally assigned name in order for your analysis tools to read them properly (ie: CS_week1.csv).
Understand File Structure Once all data has been collected and organized into separate files on your desktop/laptop computer/mobile device/etc., it's time to become familiar with what type of information is included in each file. The main folder contains three main data files: week1-3 and seedings. The week1-3 contain teams matched against one another according to university, point score from match results as well as team name and website URL associated with university entry; whereas the seedings include a ranking system amongst university entries which are accompanied by information regarding team names, website URLs etc.. Furthermore, there is additional file featured which contains currentRankings scores for each individual player/teams for an first given period of competition (ie: first week).
Analyzing Data Now that everything is set up on your end it’s time explore! You can dive deep into trends amongst universities or individual players in regards to specific match performances or standings overall throughout weeks of competition etc… Furthermore you may also jumpstart insights via further creation of graphs based off compiled date from sources taken from BUECTracker dataset! For example let us say we wanted compare two universities- let's say Harvard University v Cornell University - against one another since beginning of event i we shall extract respective points(column),dates(column)(found under result tab) ,regions(csilluminating North America vs Europe etc)general stats such as maps played etc.. As well any other custom ideas which would come along in regards when dealing with similar datasets!
- Analyze the performance of teams and identify areas for improvement for better performance in future competitions.
- Assess which esports platforms are the most popular among gamers.
- Gain a better understanding of player rankings across different regions, based on rankings system, to create targeted strategies that could boost individual players' scoring potential or team overall success in competitive gaming events
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: CS_week1.csv | Column name | Description | |:---------------|:----------------------------------------------| | Match ID | Unique identifier for each match. (Integer) | | Team 1 | Name of the first team in the match. (String) | | University | University associated with the team. (String) |
File: CS_week1_currentRankings.csv | Column name | Description | |:--------------|:-----------------------------------------------------------|...
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
jchook/example-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset of example filament cutouts for use in demo notebook. Filaments were extracted from GONG H-Alpha solar observations based on annotations from the MAGFiLO dataset. Of the extracted filaments 102 cutouts, 34 of each class, were selected to create a dataset of examples.
Facebook
TwitterDataset Card for cot-example-dataset
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dvilasuero/cot-example-dataset/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/cot-example-dataset.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains match data from the top football leagues and cup competitions in France and Germany. The dataset provides comprehensive information about home and away teams, their scores, match dates, and seasons. It is a valuable resource for football enthusiasts, data scientists, and analysts interested in exploring football statistics and trends across two of Europe's biggest football nations.
Country: The country where the match took place (France or Germany).
Example values: 'France', 'Germany'
Lig: The specific league or cup in which the match was played. This column captures whether the match was part of Ligue 1, Ligue 2, Coupe de France, Bundesliga, 2. Bundesliga, or DFB-Pokal.
Example values: 'Ligue 1', 'Bundesliga', 'DFB-Pokal'
home_team: The name of the home team in the match.
Example values: 'Paris Saint-Germain', 'Bayern Munich'
away_team: The name of the away team in the match.
Example values: 'Olympique Lyonnais', 'Borussia Dortmund'
home_score: The number of goals scored by the home team in the match.
Example values: '3', '0'
away_score: The number of goals scored by the away team in the match.
Example values: '1', '2'
season_year: The season in which the match took place. Typically, football seasons run from one year to the next (e.g., 2022-2023 season).
Example values: '2022/2023', '2021/2022'
Date_day: The specific day on which the match was played, formatted as day and month (dd.mm).
Example values: '05.01', '29.09'
Date_hour: The hour and minute the match kicked off, formatted as hh:mm.
Example values: '20:45', '18:30'
This dataset can be used for various purposes, including: - Analyzing team performance trends over different seasons. - Comparing goal-scoring patterns in home vs. away matches. - Building predictive models to forecast match outcomes based on historical data. - Understanding football dynamics in France and Germany through data visualizations.
Feel free to explore and use this dataset to draw your own insights and conclusions!
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All of the data required for replicating the analyses are located in /DataFiles, with appropriately named subfolders. MainTextRep.R replicates all figures, tables, and numbers reported in the main text with the data provided. SIRep.R does the same for the Supplementary Information. The data folders also contain code that generate both the MTurk responses and the Stan fits. However, some of the functionality is deprecated, and the new package sentimentIt operates differently than earlier functionality, for example requiring login credentials. The code HITapi.R was used for some of the data collection, but is based on this earlier functionality. For an example using the newest functionality, refer to the SI or to the file packageImm_anon.R in /DataFiles/ImmigrationSurvey, which is a fully functional use of the package with the exception of the password being anonymous.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"
DOI: https://doi.org/10.11583/DTU.c.6287841
Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour.
The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.
We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course
These datasets consists of unlabelled trajectories for the purpose of training unsupervised models. For labelled datasets for evaluation please refer to the collection. Link in Related publications.
The data is saved using the pickle format for Python Each dataset is split into 2 files with naming convention:
datasetInfo_XXX
data_XXX
Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.
The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.
Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl
See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.
Facebook
Twitterharpreetsahota/ragas-example-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis dataset contains Hospital General Information from the U.S. Department of Health & Human Services. This is the BigQuery COVID-19 public dataset. This data contains a list of all hospitals that have been registered with Medicare. This list includes addresses, phone numbers, hospital types and quality of care information. The quality of care data is provided for over 4,000 Medicare-certified hospitals, including over 130 Veterans Administration (VA) medical centers, across the country. You can use this data to find hospitals and compare the quality of their care
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.cms_medicare.hospital_general_info.
How do the hospitals in Mountain View, CA compare to the average hospital in the US? With the hospital compare data you can quickly understand how hospitals in one geographic location compare to another location. In this example query we compare Google’s home in Mountain View, California, to the average hospital in the United States. You can also modify the query to learn how the hospitals in your city compare to the US national average.
“#standardSQL
SELECT
MTV_AVG_HOSPITAL_RATING,
US_AVG_HOSPITAL_RATING
FROM (
SELECT
ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS MTV_AVG_HOSPITAL_RATING
FROM
bigquery-public-data.cms_medicare.hospital_general_info
WHERE
city = 'MOUNTAIN VIEW'
AND state = 'CA'
AND hospital_overall_rating <> 'Not Available') MTV
JOIN (
SELECT
ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS US_AVG_HOSPITAL_RATING
FROM
bigquery-public-data.cms_medicare.hospital_general_info
WHERE
hospital_overall_rating <> 'Not Available')
ON
1 = 1”
What are the most common diseases treated at hospitals that do well in the category of patient readmissions?
For hospitals that achieved “Above the national average” in the category of patient readmissions, it might be interesting to review the types of diagnoses that are treated at those inpatient facilities. While this query won’t provide the granular detail that went into the readmission calculation, it gives us a quick glimpse into the top disease related groups (DRG)
, or classification of inpatient stays that are found at those hospitals. By joining the general hospital information to the inpatient charge data, also provided by CMS, you could quickly identify DRGs that may warrant additional research. You can also modify the query to review the top diagnosis related groups for hospital metrics you might be interested in.
“#standardSQL
SELECT
drg_definition,
SUM(total_discharges) total_discharge_per_drg
FROM
bigquery-public-data.cms_medicare.hospital_general_info gi
INNER JOIN
bigquery-public-data.cms_medicare.inpatient_charges_2015 ic
ON
gi.provider_id = ic.provider_id
WHERE
readmission_national_comparison = 'Above the national average'
GROUP BY
drg_definition
ORDER BY
total_discharge_per_drg DESC
LIMIT
10;”
Facebook
TwitterNon-small cell lung cancer (NSCLC) accounts for the majority of lung cancer cases, making it the most fatal diseases worldwide. Predicting NSCLC patients’ survival outcomes accurately remains a significant challenge despite advancements in treatment. The difficulties in developing effective drug therapies, which are frequently hampered by severe side effects, drug resistance, and limited effectiveness across diverse patient populations, highlight the complexity of NSCLC. The machine learning (ML) and deep learning (DL) modelsare starting to reform the field of NSCLC drug disclosure. These methodologies empower the distinguishing proof of medication targets and the improvement of customized treatment techniques that might actually upgrade endurance results for NSCLC patients. Using cutting-edge methods of feature extraction and transfer learning, we present a drug discovery model for the identification of therapeutic targets in this paper. For the purpose of extracting features from drug and protein sequences, we make use of a hybrid UNet transformer. This makes it possible to extract deep features that address the issue of false alarms. For dimensionality reduction, the modified Rime optimization (MRO) algorithm is used to select the best features among multiples. In addition, we design the deep transfer learning (DTransL) model to boost the drug discovery accuracy for NSCLC patients’ therapeutic targets. Davis, KIBA, and Binding-DB are examples of benchmark datasets that are used to validate the proposed model. Results exhibit that the MRO+DTransL model outflanks existing cutting edge models. On the Davis dataset, the MRO+DTransL model performed better than the LSTM model by 9.742%, achieved an accuracy of 98.398%. It reached 98.264% and 97.344% on the KIBA and Binding-DB datasets, respectively, indicating improvements of 8.608% and 8.957% over baseline models.
Facebook
TwitterThe interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.
Facebook
TwitterThe dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
These are the data summarising the modelled Hydrological Response Variable (HRV) variability versus climate interannual variability which has been used as an indicator of risk. For example, to understand the significance of the modelled increases in low-flow days, it is useful to look at them in the context of the interannual variability in low-flow days due to climate. In other words, are the modelled increases due to additional coal resource development within the natural range of variability of the longer-term flow regime, or are they potentially moving the system outside the range of hydrological variability it experiences under the current climate? The maximum increase in the number of low-flow days due to additional coal resource development relative to the interannual variability in low-flow days under the baseline has been adopted to put some context around the modelled changes. If the maximum change is small relative to the interannual variability due to climate (e.g. an increase of 3 days relative to a baseline range of 20 to 50 days), then the risk of impacts from the changes in low-flow days is likely to be low. If the maximum change is comparable to or greater than the interannual variability due to climate (e.g. an increase of 200 days relative to a baseline range of 20 to 50 days), then there is a greater risk of impact on the landscape classes and assets that rely on this water source. Here changes comparable to or greater than interannual variability are interpreted as presenting a risk. However, the change due to the additional coal resource development is additive, so even a 'less than interannual variability' change is not free from risk. Results of the interannual variability comparison should be viewed as indicators of risk.
This dataset is generated using 1000 HRV simulations together with climate inputs. Ratios between the variability in HRVs, and the variability attributable interannual variability due to climate, were calculated for the HRVs. Results of the interannual variability comparison should be viewed as indicators of risk.
Bioregional Assessment Programme (2017) HUN Comparison of model variability and interannual variability. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/1c0a19f9-98c2-4d92-956d-dd764aaa10f9.
Derived From River Styles Spatial Layer for New South Wales
Derived From SYD ALL climate data statistics summary
Derived From HUN AWRA-R Observed storage volumes Glenbawn Dam and Glennies Creek Dam
Derived From Hunter River Salinity Scheme Discharge NSW EPA 2006-2012
Derived From HUN AWRA-R simulation nodes v01
Derived From Bioregional Assessment areas v06
Derived From Hunter AWRA Hydrological Response Variables (HRV)
Derived From GEODATA 9 second DEM and D8: Digital Elevation Model Version 3 and Flow Direction Grid 2008
Derived From HUN AWRA-L simulation nodes_v01
Derived From Bioregional Assessment areas v04
Derived From HUN AWRA-R Gauge Station Cross Sections v01
Derived From Gippsland Project boundary
Derived From Natural Resource Management (NRM) Regions 2010
Derived From BA All Regions BILO cells in subregions shapefile
Derived From Hunter Surface Water data v2 20140724
Derived From HUN AWRA-R River Reaches Simulation v01
Derived From HUN AWRA-L simulation nodes v02
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Derived From HUN AWRA-R Irrigation Area Extents and Crop Types v01
Derived From GEODATA TOPO 250K Series 3
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From Geological Provinces - Full Extent
Derived From BA SYD selected GA TOPO 250K data plus added map features
Derived From HUN gridded daily PET from 1973-2102 v01
Derived From Bioregional_Assessment_Programme_Catchment Scale Land Use of Australia - 2014
Derived From Bioregional Assessment areas v03
Derived From IQQM Model Simulation Regulated Rivers NSW DPI HUN 20150615
Derived From HUN AWRA-R calibration catchments v01
Derived From Bioregional Assessment areas v05
Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012
Derived From National Surface Water sites Hydstra
Derived From Selected streamflow gauges within and near the Hunter subregion
Derived From ASRIS Continental-scale soil property predictions 2001
Derived From Hunter Surface Water data extracted 20140718
Derived From Mean Annual Climate Data of Australia 1981 to 2012
Derived From HUN AWRA-R calibration nodes v01
Derived From HUN future climate rainfall v01
Derived From HUN AWRA-LR Model v01
Derived From HUN AWRA-L ASRIS soil properties v01
Derived From HUN AWRAR restricted input 01
Derived From Bioregional Assessment areas v01
Derived From Bioregional Assessment areas v02
Derived From Victoria - Seamless Geology 2014
Derived From HUN AWRA-L Site Station Cross Sections v01
Derived From HUN AWRA-R simulation catchments v01
Derived From HUN AWRA-R Simulation Node Cross Sections v01
Derived From Climate model 0.05x0.05 cells and cell centroids
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
example dataset
Facebook
TwitterData standardization is an important part of effective management. However, sometimes people have data that doesn't match. This dataset includes different ways that counties might get written by different people. It can be used as a lookup table when you need County to be your unique identifier. For example, it allows you to match St. Mary's, St Marys, and Saint Mary's so that you can use it with disparate data from other data sets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Brevard population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Brevard across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2023, the population of Brevard was 7,847, a 0.35% increase year-by-year from 2022. Previously, in 2022, Brevard population was 7,820, an increase of 0.01% compared to a population of 7,819 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Brevard increased by 884. In this period, the peak population was 7,879 in the year 2018. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Brevard Population by Year. You can refer the same here
Facebook
TwitterGuidelines
These guidelines are based on the paper Training Language Models to Follow Instructions with Human Feedback You are given a text-based description of a task, submitted by a user. This task description may be in the form of an explicit instruction (e.g. "Write a story about a wise frog."). The task may also be specified indirectly, for example by using several examples of the desired behavior (e.g. given a sequence of movie reviews followed by their sentiment, followed by… See the full description on the dataset page: https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon.