100+ datasets found

h
comparison-dataset-dolly-curated-falcon
huggingface.co
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla (2023). comparison-dataset-dolly-curated-falcon [Dataset]. https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2023
Dataset authored and provided by
Argilla
Description
Guidelines

These guidelines are based on the paper Training Language Models to Follow Instructions with Human Feedback You are given a text-based description of a task, submitted by a user. This task description may be in the form of an explicit instruction (e.g. "Write a story about a wise frog."). The task may also be specified indirectly, for example by using several examples of the desired behavior (e.g. given a sequence of movie reviews followed by their sentiment, followed by… See the full description on the dataset page: https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon.
f
Comparison of selected metrics across R packages using example dataset.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Buchanan, David; Urbanek, Jacek; Broll, Steven; Punjabi, Naresh M.; Chun, Elizabeth; Gaynanova, Irina; Muschelli, John (2021). Comparison of selected metrics across R packages using example dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000799416
Explore at:
Dataset updated
Apr 1, 2021
Authors
Buchanan, David; Urbanek, Jacek; Broll, Steven; Punjabi, Naresh M.; Chun, Elizabeth; Gaynanova, Irina; Muschelli, John
Description
Comparison of selected metrics across R packages using example dataset.
Large Language Models Comparison Dataset
kaggle.com
zip
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samay Ashar (2025). Large Language Models Comparison Dataset [Dataset]. https://www.kaggle.com/datasets/samayashar/large-language-models-comparison-dataset
Explore at:
zip(5894 bytes)Available download formats
Dataset updated
Feb 24, 2025
Authors
Samay Ashar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a comparison of various Large Language Models (LLMs) based on their performance, cost, and efficiency. It includes important details like speed, latency, benchmarks, and pricing, helping users understand how different models stack up against each other.

Key Details:

File Name: llm_comparison_dataset.csv

Size: 14.57 kB

Total Columns: 15

License: CC0 (Public Domain)

What’s Inside?

Here are some of the key metrics included in the dataset:

Context Window: Maximum number of tokens the model can process at once.

Speed (tokens/sec): How fast the model generates responses.

Latency (sec): Time delay before the model responds.

Benchmark Scores: Performance ratings from MMLU (academic tasks) and Chatbot Arena (real-world chatbot performance).

Open-Source: Indicates if the model is publicly available or proprietary.

Price per Million Tokens: The cost of using the model for one million tokens.

Training Dataset Size: Amount of data used to train the model.

Compute Power: Resources needed to run the model.

Energy Efficiency: How much power the model consumes.

This dataset is useful for researchers, developers, and AI enthusiasts who want to compare LLMs and choose the best one based on their needs.

📌If you find this dataset useful, do give an upvote :)
h
example-dataset-2
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Jackson, example-dataset-2 [Dataset]. https://huggingface.co/datasets/djackson-proofpoint/example-dataset-2
Explore at:
Authors
Dan Jackson
Description
Dataset Card for example-dataset-2

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/djackson-proofpoint/example-dataset-2/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/djackson-proofpoint/example-dataset-2.
Data associated with comparison of recharge from drywells and infiltration...
catalog.data.gov
datasets.ai
+1more
Updated Jun 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Data associated with comparison of recharge from drywells and infiltration basins: a modeling study [Dataset]. https://catalog.data.gov/dataset/data-associated-with-comparison-of-recharge-from-drywells-and-infiltration-basins-a-modeli
Explore at:
Dataset updated
Jun 29, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This research effort is a modeling study using the HYDRUS (2D/3D) computer program (www.pc-progress.com) and described in the manuscript/journal article entitled “Comparison of recharge from drywells and infiltration basins: a modeling study.” All the tables and figures in the journal article will be documented within an Excel spreadsheet that will include worksheet tabs with data associated with each table and figure. The tabs, columns, and rows will be clearly labeled to identify table/figures, variables, and units. The information supporting the model runs will be supported in the example library of HYDRUS (2D/3D) maintained by PC-Progress. Non-standard HYDRUS subroutines for the drywell and for the infiltration pond simulations that were funded by this research will be added and made available for viewing and download. After the 1 year embargo period the site will include a link to the PubMed Central manuscript. For example, the HYDRUS library for the transient head drywell associated with the Sasidharan et al. (2018) paper is now active (https://www.pcprogress.com/en/Default.aspx?h3d2-lib-Drywell ). This dataset is associated with the following publication: Sasidharan, S., S. Bradford, J. Simunek, and S. Kraemer. Comparison of recharge from drywells and infiltration basins: A modeling study. JOURNAL OF HYDROLOGY. Elsevier Science Ltd, New York, NY, USA, 594: 125720, (2021).
Esports Performance Rankings and Results
kaggle.com
zip
Updated Dec 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Esports Performance Rankings and Results [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-collegiate-esports-performance-with-bu
Explore at:
zip(110148 bytes)Available download formats
Dataset updated
Dec 12, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Esports Performance Rankings and Results

Performance Rankings and Results from Multiple Esports Platforms

By [source]

About this dataset

This dataset provides a detailed look into the world of competitive video gaming in universities. It covers a wide range of topics, from performance rankings and results across multiple esports platforms to the individual team and university rankings within each tournament. With an incredible wealth of data, fans can discover statistics on their favorite teams or explore the challenges placed upon university gamers as they battle it out to be the best. Dive into the information provided and get an inside view into the world of collegiate esports tournaments as you assess all things from Match ID, Team 1, University affiliations, Points earned or lost in each match and special Seeds or UniSeeds for exceptional teams. Of course don't forget about exploring all the great Team Names along with their corresponding websites for further details on stats across tournaments!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Download Files First, make sure you have downloaded the CS_week1, CS_week2, CS_week3 and seeds datasets on Kaggle. You will also need to download the currentRankings file for each week of competition. All files should be saved using their originally assigned name in order for your analysis tools to read them properly (ie: CS_week1.csv).

Understand File Structure Once all data has been collected and organized into separate files on your desktop/laptop computer/mobile device/etc., it's time to become familiar with what type of information is included in each file. The main folder contains three main data files: week1-3 and seedings. The week1-3 contain teams matched against one another according to university, point score from match results as well as team name and website URL associated with university entry; whereas the seedings include a ranking system amongst university entries which are accompanied by information regarding team names, website URLs etc.. Furthermore, there is additional file featured which contains currentRankings scores for each individual player/teams for an first given period of competition (ie: first week).

Analyzing Data Now that everything is set up on your end it’s time explore! You can dive deep into trends amongst universities or individual players in regards to specific match performances or standings overall throughout weeks of competition etc… Furthermore you may also jumpstart insights via further creation of graphs based off compiled date from sources taken from BUECTracker dataset! For example let us say we wanted compare two universities- let's say Harvard University v Cornell University - against one another since beginning of event i we shall extract respective points(column),dates(column)(found under result tab) ,regions(csilluminating North America vs Europe etc)general stats such as maps played etc.. As well any other custom ideas which would come along in regards when dealing with similar datasets!

Research Ideas

Analyze the performance of teams and identify areas for improvement for better performance in future competitions.

Assess which esports platforms are the most popular among gamers.

Gain a better understanding of player rankings across different regions, based on rankings system, to create targeted strategies that could boost individual players' scoring potential or team overall success in competitive gaming events

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: CS_week1.csv | Column name | Description | |:---------------|:----------------------------------------------| | Match ID | Unique identifier for each match. (Integer) | | Team 1 | Name of the first team in the match. (String) | | University | University associated with the team. (String) |

File: CS_week1_currentRankings.csv | Column name | Description | |:--------------|:-----------------------------------------------------------|...
h
Data from: example-dataset
huggingface.co
Updated Oct 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wes Roberts (2024). example-dataset [Dataset]. https://huggingface.co/datasets/jchook/example-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2024
Authors
Wes Roberts
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
jchook/example-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
H
Chirality Classification Comparison Demo Examples
dataverse.harvard.edu
dataone.org
+1more
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Chalmers; Azim Ahmadzadeh (2025). Chirality Classification Comparison Demo Examples [Dataset]. http://doi.org/10.7910/DVN/52SOS7
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/52SOS7
Dataset updated
Oct 15, 2025
Dataset provided by
Harvard Dataverse
Authors
Alex Chalmers; Azim Ahmadzadeh
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset of example filament cutouts for use in demo notebook. Filaments were extracted from GONG H-Alpha solar observations based on annotations from the MAGFiLO dataset. Of the extracted filaments 102 cutouts, 34 of each class, were selected to create a dataset of examples.
h
cot-example-dataset
huggingface.co
Updated Nov 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Vila (2024). cot-example-dataset [Dataset]. https://huggingface.co/datasets/dvilasuero/cot-example-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2024
Authors
Daniel Vila
Description
Dataset Card for cot-example-dataset

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dvilasuero/cot-example-dataset/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/cot-example-dataset.
France and Germany Football Leagues Dataset
kaggle.com
zip
Updated Oct 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gökhan Ergül (2024). France and Germany Football Leagues Dataset [Dataset]. https://www.kaggle.com/datasets/gokhanergul/france-and-germany-football-leagues-dataset
Explore at:
zip(363605 bytes)Available download formats
Dataset updated
Oct 6, 2024
Authors
Gökhan Ergül
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Germany, France
Description
France and Germany Football Leagues Dataset

This dataset contains match data from the top football leagues and cup competitions in France and Germany. The dataset provides comprehensive information about home and away teams, their scores, match dates, and seasons. It is a valuable resource for football enthusiasts, data scientists, and analysts interested in exploring football statistics and trends across two of Europe's biggest football nations.

Dataset Summary:

Total Matches: 38,596

Countries Covered: France, Germany

Leagues Included:

Ligue 1 (France)

Ligue 2 (France)

Coupe de France

Bundesliga (Germany)

2. Bundesliga (Germany)

DFB-Pokal (Germany)

Column Descriptions:

Country: The country where the match took place (France or Germany).
Example values: 'France', 'Germany'

Lig: The specific league or cup in which the match was played. This column captures whether the match was part of Ligue 1, Ligue 2, Coupe de France, Bundesliga, 2. Bundesliga, or DFB-Pokal.
Example values: 'Ligue 1', 'Bundesliga', 'DFB-Pokal'

home_team: The name of the home team in the match.
Example values: 'Paris Saint-Germain', 'Bayern Munich'

away_team: The name of the away team in the match.
Example values: 'Olympique Lyonnais', 'Borussia Dortmund'

home_score: The number of goals scored by the home team in the match.
Example values: '3', '0'

away_score: The number of goals scored by the away team in the match.
Example values: '1', '2'

season_year: The season in which the match took place. Typically, football seasons run from one year to the next (e.g., 2022-2023 season).
Example values: '2022/2023', '2021/2022'

Date_day: The specific day on which the match was played, formatted as day and month (dd.mm).
Example values: '05.01', '29.09'

Date_hour: The hour and minute the match kicked off, formatted as hh:mm.
Example values: '20:45', '18:30'

Use Cases:

This dataset can be used for various purposes, including: - Analyzing team performance trends over different seasons. - Comparing goal-scoring patterns in home vs. away matches. - Building predictive models to forecast match outcomes based on historical data. - Understanding football dynamics in France and Germany through data visualizations.

Feel free to explore and use this dataset to draw your own insights and conclusions!
H
Replication Data for: A pairwise comparison framework for fast, flexible,...
dataverse.harvard.edu
Updated Mar 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Carlson; Jacob M. Montgomery (2018). Replication Data for: A pairwise comparison framework for fast, flexible, and reliable human coding of political texts [Dataset]. http://doi.org/10.7910/DVN/0ZRGEE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/0ZRGEE
Dataset updated
Mar 30, 2018
Dataset provided by
Harvard Dataverse
Authors
David Carlson; Jacob M. Montgomery
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
All of the data required for replicating the analyses are located in /DataFiles, with appropriately named subfolders. MainTextRep.R replicates all figures, tables, and numbers reported in the main text with the data provided. SIRep.R does the same for the Supplementary Information. The data folders also contain code that generate both the MTurk responses and the Stan fits. However, some of the functionality is deprecated, and the new package sentimentIt operates differently than earlier functionality, for example requiring login credentials. The code HITapi.R was used for some of the data collection, but is based on this earlier functionality. For an example using the newest functionality, refer to the SI or to the file packageImm_anon.R in /DataFiles/ImmigrationSurvey, which is a fully functional use of the package with the exception of the password being anonymous.
d
Unlabelled training datasets of AIS Trajectories from Danish Waters for...
data.dtu.dk
bin
Updated Jul 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen (2023). Unlabelled training datasets of AIS Trajectories from Danish Waters for Abnormal Behavior Detection [Dataset]. http://doi.org/10.11583/DTU.21511842.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.21511842.v1
Dataset updated
Jul 10, 2023
Dataset provided by
Technical University of Denmark
Authors
Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"

DOI: https://doi.org/10.11583/DTU.c.6287841

Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour.

The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.

We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course

These datasets consists of unlabelled trajectories for the purpose of training unsupervised models. For labelled datasets for evaluation please refer to the collection. Link in Related publications.

The data is saved using the pickle format for Python Each dataset is split into 2 files with naming convention:

datasetInfo_XXX
data_XXX

Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.

The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.

Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl

See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.
h
ragas-example-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harpreet Sahota, ragas-example-dataset [Dataset]. https://huggingface.co/datasets/harpreetsahota/ragas-example-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Harpreet Sahota
Description
harpreetsahota/ragas-example-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
cms-medicare
kaggle.com
zip
Updated Apr 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). cms-medicare [Dataset]. https://www.kaggle.com/datasets/bigquery/cms-medicare
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 21, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

This dataset contains Hospital General Information from the U.S. Department of Health & Human Services. This is the BigQuery COVID-19 public dataset. This data contains a list of all hospitals that have been registered with Medicare. This list includes addresses, phone numbers, hospital types and quality of care information. The quality of care data is provided for over 4,000 Medicare-certified hospitals, including over 130 Veterans Administration (VA) medical centers, across the country. You can use this data to find hospitals and compare the quality of their care

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.cms_medicare.hospital_general_info.

Sample Query

How do the hospitals in Mountain View, CA compare to the average hospital in the US? With the hospital compare data you can quickly understand how hospitals in one geographic location compare to another location. In this example query we compare Google’s home in Mountain View, California, to the average hospital in the United States. You can also modify the query to learn how the hospitals in your city compare to the US national average.

“#standardSQL SELECT MTV_AVG_HOSPITAL_RATING, US_AVG_HOSPITAL_RATING FROM ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS MTV_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE city = 'MOUNTAIN VIEW' AND state = 'CA' AND hospital_overall_rating <> 'Not Available') MTV JOIN ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS US_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE hospital_overall_rating <> 'Not Available') ON 1 = 1”

What are the most common diseases treated at hospitals that do well in the category of patient readmissions? For hospitals that achieved “Above the national average” in the category of patient readmissions, it might be interesting to review the types of diagnoses that are treated at those inpatient facilities. While this query won’t provide the granular detail that went into the readmission calculation, it gives us a quick glimpse into the top disease related groups (DRG)
, or classification of inpatient stays that are found at those hospitals. By joining the general hospital information to the inpatient charge data, also provided by CMS, you could quickly identify DRGs that may warrant additional research. You can also modify the query to review the top diagnosis related groups for hospital metrics you might be interested in. “#standardSQL SELECT drg_definition, SUM(total_discharges) total_discharge_per_drg FROM bigquery-public-data.cms_medicare.hospital_general_info gi INNER JOIN bigquery-public-data.cms_medicare.inpatient_charges_2015 ic ON gi.provider_id = ic.provider_id WHERE readmission_national_comparison = 'Above the national average' GROUP BY drg_definition ORDER BY total_discharge_per_drg DESC LIMIT 10;”
f
Error measure comparison for Binding-DB dataset.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mittal, Ruchi; Kumari, Swati; Malik, Varun; Juneja, Sapna; Mohiuddin, Khalid; Gupta, Deepali (2025). Error measure comparison for Binding-DB dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002068808
Explore at:
Dataset updated
Apr 29, 2025
Authors
Mittal, Ruchi; Kumari, Swati; Malik, Varun; Juneja, Sapna; Mohiuddin, Khalid; Gupta, Deepali
Description
Non-small cell lung cancer (NSCLC) accounts for the majority of lung cancer cases, making it the most fatal diseases worldwide. Predicting NSCLC patients’ survival outcomes accurately remains a significant challenge despite advancements in treatment. The difficulties in developing effective drug therapies, which are frequently hampered by severe side effects, drug resistance, and limited effectiveness across diverse patient populations, highlight the complexity of NSCLC. The machine learning (ML) and deep learning (DL) modelsare starting to reform the field of NSCLC drug disclosure. These methodologies empower the distinguishing proof of medication targets and the improvement of customized treatment techniques that might actually upgrade endurance results for NSCLC patients. Using cutting-edge methods of feature extraction and transfer learning, we present a drug discovery model for the identification of therapeutic targets in this paper. For the purpose of extracting features from drug and protein sequences, we make use of a hybrid UNet transformer. This makes it possible to extract deep features that address the issue of false alarms. For dimensionality reduction, the modified Rime optimization (MRO) algorithm is used to select the best features among multiples. In addition, we design the deep transfer learning (DTransL) model to boost the drug discovery accuracy for NSCLC patients’ therapeutic targets. Davis, KIBA, and Binding-DB are examples of benchmark datasets that are used to validate the proposed model. Results exhibit that the MRO+DTransL model outflanks existing cutting edge models. On the Davis dataset, the MRO+DTransL model performed better than the LSTM model by 9.742%, achieved an accuracy of 98.398%. It reached 98.264% and 97.344% on the KIBA and Binding-DB datasets, respectively, indicating improvements of 8.608% and 8.957% over baseline models.
f
DataSheet_1_Performance comparison of TCR-pMHC prediction tools reveals a...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Apr 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdollahi, Sina; Ly, Cedric; Bonn, Stefan; Deng, Lihua; Zhao, Yu; Prinz, Immo (2023). DataSheet_1_Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000937010
Explore at:
Dataset updated
Apr 18, 2023
Authors
Abdollahi, Sina; Ly, Cedric; Bonn, Stefan; Deng, Lihua; Zhao, Yu; Prinz, Immo
Description
The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.
HUN Comparison of model variability and interannual variability
researchdata.edu.au
data.gov.au
Updated Mar 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2019). HUN Comparison of model variability and interannual variability [Dataset]. https://researchdata.edu.au/hun-comparison-model-interannual-variability/2987743
Explore at:
Dataset updated
Mar 13, 2019
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

These are the data summarising the modelled Hydrological Response Variable (HRV) variability versus climate interannual variability which has been used as an indicator of risk. For example, to understand the significance of the modelled increases in low-flow days, it is useful to look at them in the context of the interannual variability in low-flow days due to climate. In other words, are the modelled increases due to additional coal resource development within the natural range of variability of the longer-term flow regime, or are they potentially moving the system outside the range of hydrological variability it experiences under the current climate? The maximum increase in the number of low-flow days due to additional coal resource development relative to the interannual variability in low-flow days under the baseline has been adopted to put some context around the modelled changes. If the maximum change is small relative to the interannual variability due to climate (e.g. an increase of 3 days relative to a baseline range of 20 to 50 days), then the risk of impacts from the changes in low-flow days is likely to be low. If the maximum change is comparable to or greater than the interannual variability due to climate (e.g. an increase of 200 days relative to a baseline range of 20 to 50 days), then there is a greater risk of impact on the landscape classes and assets that rely on this water source. Here changes comparable to or greater than interannual variability are interpreted as presenting a risk. However, the change due to the additional coal resource development is additive, so even a 'less than interannual variability' change is not free from risk. Results of the interannual variability comparison should be viewed as indicators of risk.

Dataset History

This dataset is generated using 1000 HRV simulations together with climate inputs. Ratios between the variability in HRVs, and the variability attributable interannual variability due to climate, were calculated for the HRVs. Results of the interannual variability comparison should be viewed as indicators of risk.

Dataset Citation

Bioregional Assessment Programme (2017) HUN Comparison of model variability and interannual variability. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/1c0a19f9-98c2-4d92-956d-dd764aaa10f9.

Dataset Ancestors

Derived From River Styles Spatial Layer for New South Wales

Derived From SYD ALL climate data statistics summary

Derived From HUN AWRA-R Observed storage volumes Glenbawn Dam and Glennies Creek Dam

Derived From Hunter River Salinity Scheme Discharge NSW EPA 2006-2012

Derived From HUN AWRA-R simulation nodes v01

Derived From Bioregional Assessment areas v06

Derived From Hunter AWRA Hydrological Response Variables (HRV)

Derived From GEODATA 9 second DEM and D8: Digital Elevation Model Version 3 and Flow Direction Grid 2008

Derived From HUN AWRA-L simulation nodes_v01

Derived From Bioregional Assessment areas v04

Derived From HUN AWRA-R Gauge Station Cross Sections v01

Derived From Gippsland Project boundary

Derived From Natural Resource Management (NRM) Regions 2010

Derived From BA All Regions BILO cells in subregions shapefile

Derived From Hunter Surface Water data v2 20140724

Derived From HUN AWRA-R River Reaches Simulation v01

Derived From HUN AWRA-L simulation nodes v02

Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)

Derived From HUN AWRA-R Irrigation Area Extents and Crop Types v01

Derived From GEODATA TOPO 250K Series 3

Derived From NSW Catchment Management Authority Boundaries 20130917

Derived From Geological Provinces - Full Extent

Derived From BA SYD selected GA TOPO 250K data plus added map features

Derived From HUN gridded daily PET from 1973-2102 v01

Derived From Bioregional_Assessment_Programme_Catchment Scale Land Use of Australia - 2014

Derived From Bioregional Assessment areas v03

Derived From IQQM Model Simulation Regulated Rivers NSW DPI HUN 20150615

Derived From HUN AWRA-R calibration catchments v01

Derived From Bioregional Assessment areas v05

Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012

Derived From National Surface Water sites Hydstra

Derived From Selected streamflow gauges within and near the Hunter subregion

Derived From ASRIS Continental-scale soil property predictions 2001

Derived From Hunter Surface Water data extracted 20140718

Derived From Mean Annual Climate Data of Australia 1981 to 2012

Derived From HUN AWRA-R calibration nodes v01

Derived From HUN future climate rainfall v01

Derived From HUN AWRA-LR Model v01

Derived From HUN AWRA-L ASRIS soil properties v01

Derived From HUN AWRAR restricted input 01

Derived From Bioregional Assessment areas v01

Derived From Bioregional Assessment areas v02

Derived From Victoria - Seamless Geology 2014

Derived From HUN AWRA-L Site Station Cross Sections v01

Derived From HUN AWRA-R simulation catchments v01

Derived From HUN AWRA-R Simulation Node Cross Sections v01

Derived From Climate model 0.05x0.05 cells and cell centroids
example datasets
figshare.com
zip
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
soda-inria (2025). example datasets [Dataset]. http://doi.org/10.6084/m9.figshare.28241549.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28241549.v1
Dataset updated
Jan 20, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
soda-inria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
example dataset
d
Maryland Counties Match Tool for Data Quality
catalog.data.gov
opendata.maryland.gov
+1more
Updated Oct 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
opendata.maryland.gov (2025). Maryland Counties Match Tool for Data Quality [Dataset]. https://catalog.data.gov/dataset/maryland-counties-match-tool-for-data-quality
Explore at:
Dataset updated
Oct 25, 2025
Dataset provided by
opendata.maryland.gov
Area covered
Maryland
Description
Data standardization is an important part of effective management. However, sometimes people have data that doesn't match. This dataset includes different ways that counties might get written by different people. It can be used as a lookup table when you need County to be your unique identifier. For example, it allows you to match St. Mary's, St Marys, and Saint Mary's so that you can use it with disparate data from other data sets.
N
Brevard, NC Annual Population and Growth Analysis Dataset: A Comprehensive...
neilsberg.com
csv, json
Updated Jul 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Brevard, NC Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Brevard from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/brevard-nc-population-by-year/
Explore at:
json, csvAvailable download formats
Dataset updated
Jul 30, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brevard, North Carolina
Variables measured
Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
Measurement technique
The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Brevard population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Brevard across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

Key observations

In 2023, the population of Brevard was 7,847, a 0.35% increase year-by-year from 2022. Previously, in 2022, Brevard population was 7,820, an increase of 0.01% compared to a population of 7,819 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Brevard increased by 884. In this period, the peak population was 7,879 in the year 2018. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

Content

When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

Data Coverage:

From 2000 to 2023

Variables / Data Columns

Year: This column displays the data year (Measured annually and for years 2000 to 2023)

Population: The population for the specific year for the Brevard is shown in this column.

Year on Year Change: This column displays the change in Brevard population for each year compared to the previous year.

Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Brevard Population by Year. You can refer the same here

Facebook

Twitter

Click to copy link

Link copied

Cite

Argilla (2023). comparison-dataset-dolly-curated-falcon [Dataset]. https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon

comparison-dataset-dolly-curated-falcon

argilla/comparison-dataset-dolly-curated-falcon

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 7, 2023

Dataset authored and provided by

Argilla

Description

Guidelines

These guidelines are based on the paper Training Language Models to Follow Instructions with Human Feedback You are given a text-based description of a task, submitted by a user. This task description may be in the form of an explicit instruction (e.g. "Write a story about a wise frog."). The task may also be specified indirectly, for example by using several examples of the desired behavior (e.g. given a sequence of movie reviews followed by their sentiment, followed by… See the full description on the dataset page: https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon.

Clear search

Close search

Google apps

Main menu

comparison-dataset-dolly-curated-falcon

Comparison of selected metrics across R packages using example dataset.

Large Language Models Comparison Dataset

Key Details:

What’s Inside?

example-dataset-2

Data associated with comparison of recharge from drywells and infiltration...

Esports Performance Rankings and Results

Esports Performance Rankings and Results

Performance Rankings and Results from Multiple Esports Platforms

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Data from: example-dataset

Chirality Classification Comparison Demo Examples

cot-example-dataset

France and Germany Football Leagues Dataset

France and Germany Football Leagues Dataset

Dataset Summary:

Column Descriptions:

Use Cases:

Replication Data for: A pairwise comparison framework for fast, flexible,...

Unlabelled training datasets of AIS Trajectories from Danish Waters for...

ragas-example-dataset

cms-medicare

Context

Querying BigQuery tables

Sample Query

Error measure comparison for Binding-DB dataset.

DataSheet_1_Performance comparison of TCR-pMHC prediction tools reveals a...

HUN Comparison of model variability and interannual variability

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

example datasets

Maryland Counties Match Tool for Data Quality

Brevard, NC Annual Population and Growth Analysis Dataset: A Comprehensive...

About this dataset

Content

Inspiration

Recommended for further research

comparison-dataset-dolly-curated-falcon

argilla/comparison-dataset-dolly-curated-falcon