100+ datasets found
  1. h

    comparison-dataset-dolly-curated-falcon

    • huggingface.co
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2023). comparison-dataset-dolly-curated-falcon [Dataset]. https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2023
    Dataset authored and provided by
    Argilla
    Description

    Guidelines

    These guidelines are based on the paper Training Language Models to Follow Instructions with Human Feedback You are given a text-based description of a task, submitted by a user. This task description may be in the form of an explicit instruction (e.g. "Write a story about a wise frog."). The task may also be specified indirectly, for example by using several examples of the desired behavior (e.g. given a sequence of movie reviews followed by their sentiment, followed by… See the full description on the dataset page: https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon.

  2. f

    Comparison of selected metrics across R packages using example dataset.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buchanan, David; Urbanek, Jacek; Broll, Steven; Punjabi, Naresh M.; Chun, Elizabeth; Gaynanova, Irina; Muschelli, John (2021). Comparison of selected metrics across R packages using example dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000799416
    Explore at:
    Dataset updated
    Apr 1, 2021
    Authors
    Buchanan, David; Urbanek, Jacek; Broll, Steven; Punjabi, Naresh M.; Chun, Elizabeth; Gaynanova, Irina; Muschelli, John
    Description

    Comparison of selected metrics across R packages using example dataset.

  3. Large Language Models Comparison Dataset

    • kaggle.com
    zip
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samay Ashar (2025). Large Language Models Comparison Dataset [Dataset]. https://www.kaggle.com/datasets/samayashar/large-language-models-comparison-dataset
    Explore at:
    zip(5894 bytes)Available download formats
    Dataset updated
    Feb 24, 2025
    Authors
    Samay Ashar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a comparison of various Large Language Models (LLMs) based on their performance, cost, and efficiency. It includes important details like speed, latency, benchmarks, and pricing, helping users understand how different models stack up against each other.

    Key Details:

    • File Name: llm_comparison_dataset.csv
    • Size: 14.57 kB
    • Total Columns: 15
    • License: CC0 (Public Domain)

    What’s Inside?

    Here are some of the key metrics included in the dataset:

    1. Context Window: Maximum number of tokens the model can process at once.
    2. Speed (tokens/sec): How fast the model generates responses.
    3. Latency (sec): Time delay before the model responds.
    4. Benchmark Scores: Performance ratings from MMLU (academic tasks) and Chatbot Arena (real-world chatbot performance).
    5. Open-Source: Indicates if the model is publicly available or proprietary.
    6. Price per Million Tokens: The cost of using the model for one million tokens.
    7. Training Dataset Size: Amount of data used to train the model.
    8. Compute Power: Resources needed to run the model.
    9. Energy Efficiency: How much power the model consumes.

    This dataset is useful for researchers, developers, and AI enthusiasts who want to compare LLMs and choose the best one based on their needs.

    📌If you find this dataset useful, do give an upvote :)

  4. h

    example-dataset-2

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Jackson, example-dataset-2 [Dataset]. https://huggingface.co/datasets/djackson-proofpoint/example-dataset-2
    Explore at:
    Authors
    Dan Jackson
    Description

    Dataset Card for example-dataset-2

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/djackson-proofpoint/example-dataset-2/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/djackson-proofpoint/example-dataset-2.

  5. Data associated with comparison of recharge from drywells and infiltration...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Jun 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Data associated with comparison of recharge from drywells and infiltration basins: a modeling study [Dataset]. https://catalog.data.gov/dataset/data-associated-with-comparison-of-recharge-from-drywells-and-infiltration-basins-a-modeli
    Explore at:
    Dataset updated
    Jun 29, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This research effort is a modeling study using the HYDRUS (2D/3D) computer program (www.pc-progress.com) and described in the manuscript/journal article entitled “Comparison of recharge from drywells and infiltration basins: a modeling study.” All the tables and figures in the journal article will be documented within an Excel spreadsheet that will include worksheet tabs with data associated with each table and figure. The tabs, columns, and rows will be clearly labeled to identify table/figures, variables, and units. The information supporting the model runs will be supported in the example library of HYDRUS (2D/3D) maintained by PC-Progress. Non-standard HYDRUS subroutines for the drywell and for the infiltration pond simulations that were funded by this research will be added and made available for viewing and download. After the 1 year embargo period the site will include a link to the PubMed Central manuscript. For example, the HYDRUS library for the transient head drywell associated with the Sasidharan et al. (2018) paper is now active (https://www.pcprogress.com/en/Default.aspx?h3d2-lib-Drywell ). This dataset is associated with the following publication: Sasidharan, S., S. Bradford, J. Simunek, and S. Kraemer. Comparison of recharge from drywells and infiltration basins: A modeling study. JOURNAL OF HYDROLOGY. Elsevier Science Ltd, New York, NY, USA, 594: 125720, (2021).

  6. Esports Performance Rankings and Results

    • kaggle.com
    zip
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Esports Performance Rankings and Results [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-collegiate-esports-performance-with-bu
    Explore at:
    zip(110148 bytes)Available download formats
    Dataset updated
    Dec 12, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Esports Performance Rankings and Results

    Performance Rankings and Results from Multiple Esports Platforms

    By [source]

    About this dataset

    This dataset provides a detailed look into the world of competitive video gaming in universities. It covers a wide range of topics, from performance rankings and results across multiple esports platforms to the individual team and university rankings within each tournament. With an incredible wealth of data, fans can discover statistics on their favorite teams or explore the challenges placed upon university gamers as they battle it out to be the best. Dive into the information provided and get an inside view into the world of collegiate esports tournaments as you assess all things from Match ID, Team 1, University affiliations, Points earned or lost in each match and special Seeds or UniSeeds for exceptional teams. Of course don't forget about exploring all the great Team Names along with their corresponding websites for further details on stats across tournaments!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Download Files First, make sure you have downloaded the CS_week1, CS_week2, CS_week3 and seeds datasets on Kaggle. You will also need to download the currentRankings file for each week of competition. All files should be saved using their originally assigned name in order for your analysis tools to read them properly (ie: CS_week1.csv).

    Understand File Structure Once all data has been collected and organized into separate files on your desktop/laptop computer/mobile device/etc., it's time to become familiar with what type of information is included in each file. The main folder contains three main data files: week1-3 and seedings. The week1-3 contain teams matched against one another according to university, point score from match results as well as team name and website URL associated with university entry; whereas the seedings include a ranking system amongst university entries which are accompanied by information regarding team names, website URLs etc.. Furthermore, there is additional file featured which contains currentRankings scores for each individual player/teams for an first given period of competition (ie: first week).

    Analyzing Data Now that everything is set up on your end it’s time explore! You can dive deep into trends amongst universities or individual players in regards to specific match performances or standings overall throughout weeks of competition etc… Furthermore you may also jumpstart insights via further creation of graphs based off compiled date from sources taken from BUECTracker dataset! For example let us say we wanted compare two universities- let's say Harvard University v Cornell University - against one another since beginning of event i we shall extract respective points(column),dates(column)(found under result tab) ,regions(csilluminating North America vs Europe etc)general stats such as maps played etc.. As well any other custom ideas which would come along in regards when dealing with similar datasets!

    Research Ideas

    • Analyze the performance of teams and identify areas for improvement for better performance in future competitions.
    • Assess which esports platforms are the most popular among gamers.
    • Gain a better understanding of player rankings across different regions, based on rankings system, to create targeted strategies that could boost individual players' scoring potential or team overall success in competitive gaming events

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: CS_week1.csv | Column name | Description | |:---------------|:----------------------------------------------| | Match ID | Unique identifier for each match. (Integer) | | Team 1 | Name of the first team in the match. (String) | | University | University associated with the team. (String) |

    File: CS_week1_currentRankings.csv | Column name | Description | |:--------------|:-----------------------------------------------------------|...

  7. h

    Data from: example-dataset

    • huggingface.co
    Updated Oct 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wes Roberts (2024). example-dataset [Dataset]. https://huggingface.co/datasets/jchook/example-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2024
    Authors
    Wes Roberts
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    jchook/example-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. H

    Chirality Classification Comparison Demo Examples

    • dataverse.harvard.edu
    • dataone.org
    • +1more
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Chalmers; Azim Ahmadzadeh (2025). Chirality Classification Comparison Demo Examples [Dataset]. http://doi.org/10.7910/DVN/52SOS7
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Alex Chalmers; Azim Ahmadzadeh
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset of example filament cutouts for use in demo notebook. Filaments were extracted from GONG H-Alpha solar observations based on annotations from the MAGFiLO dataset. Of the extracted filaments 102 cutouts, 34 of each class, were selected to create a dataset of examples.

  9. h

    cot-example-dataset

    • huggingface.co
    Updated Nov 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila (2024). cot-example-dataset [Dataset]. https://huggingface.co/datasets/dvilasuero/cot-example-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2024
    Authors
    Daniel Vila
    Description

    Dataset Card for cot-example-dataset

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dvilasuero/cot-example-dataset/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/cot-example-dataset.

  10. France and Germany Football Leagues Dataset

    • kaggle.com
    zip
    Updated Oct 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gökhan Ergül (2024). France and Germany Football Leagues Dataset [Dataset]. https://www.kaggle.com/datasets/gokhanergul/france-and-germany-football-leagues-dataset
    Explore at:
    zip(363605 bytes)Available download formats
    Dataset updated
    Oct 6, 2024
    Authors
    Gökhan Ergül
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Germany, France
    Description

    France and Germany Football Leagues Dataset

    This dataset contains match data from the top football leagues and cup competitions in France and Germany. The dataset provides comprehensive information about home and away teams, their scores, match dates, and seasons. It is a valuable resource for football enthusiasts, data scientists, and analysts interested in exploring football statistics and trends across two of Europe's biggest football nations.

    Dataset Summary:

    • Total Matches: 38,596
    • Countries Covered: France, Germany
    • Leagues Included:
      • Ligue 1 (France)
      • Ligue 2 (France)
      • Coupe de France
      • Bundesliga (Germany)
      • 2. Bundesliga (Germany)
      • DFB-Pokal (Germany)

    Column Descriptions:

    • Country: The country where the match took place (France or Germany).
      Example values: 'France', 'Germany'

    • Lig: The specific league or cup in which the match was played. This column captures whether the match was part of Ligue 1, Ligue 2, Coupe de France, Bundesliga, 2. Bundesliga, or DFB-Pokal.
      Example values: 'Ligue 1', 'Bundesliga', 'DFB-Pokal'

    • home_team: The name of the home team in the match.
      Example values: 'Paris Saint-Germain', 'Bayern Munich'

    • away_team: The name of the away team in the match.
      Example values: 'Olympique Lyonnais', 'Borussia Dortmund'

    • home_score: The number of goals scored by the home team in the match.
      Example values: '3', '0'

    • away_score: The number of goals scored by the away team in the match.
      Example values: '1', '2'

    • season_year: The season in which the match took place. Typically, football seasons run from one year to the next (e.g., 2022-2023 season).
      Example values: '2022/2023', '2021/2022'

    • Date_day: The specific day on which the match was played, formatted as day and month (dd.mm).
      Example values: '05.01', '29.09'

    • Date_hour: The hour and minute the match kicked off, formatted as hh:mm.
      Example values: '20:45', '18:30'

    Use Cases:

    This dataset can be used for various purposes, including: - Analyzing team performance trends over different seasons. - Comparing goal-scoring patterns in home vs. away matches. - Building predictive models to forecast match outcomes based on historical data. - Understanding football dynamics in France and Germany through data visualizations.

    Feel free to explore and use this dataset to draw your own insights and conclusions!

  11. H

    Replication Data for: A pairwise comparison framework for fast, flexible,...

    • dataverse.harvard.edu
    Updated Mar 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Carlson; Jacob M. Montgomery (2018). Replication Data for: A pairwise comparison framework for fast, flexible, and reliable human coding of political texts [Dataset]. http://doi.org/10.7910/DVN/0ZRGEE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 30, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    David Carlson; Jacob M. Montgomery
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    All of the data required for replicating the analyses are located in /DataFiles, with appropriately named subfolders. MainTextRep.R replicates all figures, tables, and numbers reported in the main text with the data provided. SIRep.R does the same for the Supplementary Information. The data folders also contain code that generate both the MTurk responses and the Stan fits. However, some of the functionality is deprecated, and the new package sentimentIt operates differently than earlier functionality, for example requiring login credentials. The code HITapi.R was used for some of the data collection, but is based on this earlier functionality. For an example using the newest functionality, refer to the SI or to the file packageImm_anon.R in /DataFiles/ImmigrationSurvey, which is a fully functional use of the package with the exception of the password being anonymous.

  12. d

    Unlabelled training datasets of AIS Trajectories from Danish Waters for...

    • data.dtu.dk
    bin
    Updated Jul 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen (2023). Unlabelled training datasets of AIS Trajectories from Danish Waters for Abnormal Behavior Detection [Dataset]. http://doi.org/10.11583/DTU.21511842.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"

    DOI: https://doi.org/10.11583/DTU.c.6287841

    Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour.

    The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.

    We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course

    These datasets consists of unlabelled trajectories for the purpose of training unsupervised models. For labelled datasets for evaluation please refer to the collection. Link in Related publications.

    The data is saved using the pickle format for Python Each dataset is split into 2 files with naming convention:

    datasetInfo_XXX
    data_XXX

    Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.

    The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.

    Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl

    See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.

  13. h

    ragas-example-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harpreet Sahota, ragas-example-dataset [Dataset]. https://huggingface.co/datasets/harpreetsahota/ragas-example-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Harpreet Sahota
    Description

    harpreetsahota/ragas-example-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. cms-medicare

    • kaggle.com
    zip
    Updated Apr 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). cms-medicare [Dataset]. https://www.kaggle.com/datasets/bigquery/cms-medicare
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 21, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    Context

    This dataset contains Hospital General Information from the U.S. Department of Health & Human Services. This is the BigQuery COVID-19 public dataset. This data contains a list of all hospitals that have been registered with Medicare. This list includes addresses, phone numbers, hospital types and quality of care information. The quality of care data is provided for over 4,000 Medicare-certified hospitals, including over 130 Veterans Administration (VA) medical centers, across the country. You can use this data to find hospitals and compare the quality of their care

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.cms_medicare.hospital_general_info.

    Sample Query

    How do the hospitals in Mountain View, CA compare to the average hospital in the US? With the hospital compare data you can quickly understand how hospitals in one geographic location compare to another location. In this example query we compare Google’s home in Mountain View, California, to the average hospital in the United States. You can also modify the query to learn how the hospitals in your city compare to the US national average.

    “#standardSQL SELECT MTV_AVG_HOSPITAL_RATING, US_AVG_HOSPITAL_RATING FROM ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS MTV_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE city = 'MOUNTAIN VIEW' AND state = 'CA' AND hospital_overall_rating <> 'Not Available') MTV JOIN ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS US_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE hospital_overall_rating <> 'Not Available') ON 1 = 1”

    What are the most common diseases treated at hospitals that do well in the category of patient readmissions? For hospitals that achieved “Above the national average” in the category of patient readmissions, it might be interesting to review the types of diagnoses that are treated at those inpatient facilities. While this query won’t provide the granular detail that went into the readmission calculation, it gives us a quick glimpse into the top disease related groups (DRG)
    , or classification of inpatient stays that are found at those hospitals. By joining the general hospital information to the inpatient charge data, also provided by CMS, you could quickly identify DRGs that may warrant additional research. You can also modify the query to review the top diagnosis related groups for hospital metrics you might be interested in. “#standardSQL SELECT drg_definition, SUM(total_discharges) total_discharge_per_drg FROM bigquery-public-data.cms_medicare.hospital_general_info gi INNER JOIN bigquery-public-data.cms_medicare.inpatient_charges_2015 ic ON gi.provider_id = ic.provider_id WHERE readmission_national_comparison = 'Above the national average' GROUP BY drg_definition ORDER BY total_discharge_per_drg DESC LIMIT 10;”

  15. f

    Error measure comparison for Binding-DB dataset.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mittal, Ruchi; Kumari, Swati; Malik, Varun; Juneja, Sapna; Mohiuddin, Khalid; Gupta, Deepali (2025). Error measure comparison for Binding-DB dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002068808
    Explore at:
    Dataset updated
    Apr 29, 2025
    Authors
    Mittal, Ruchi; Kumari, Swati; Malik, Varun; Juneja, Sapna; Mohiuddin, Khalid; Gupta, Deepali
    Description

    Non-small cell lung cancer (NSCLC) accounts for the majority of lung cancer cases, making it the most fatal diseases worldwide. Predicting NSCLC patients’ survival outcomes accurately remains a significant challenge despite advancements in treatment. The difficulties in developing effective drug therapies, which are frequently hampered by severe side effects, drug resistance, and limited effectiveness across diverse patient populations, highlight the complexity of NSCLC. The machine learning (ML) and deep learning (DL) modelsare starting to reform the field of NSCLC drug disclosure. These methodologies empower the distinguishing proof of medication targets and the improvement of customized treatment techniques that might actually upgrade endurance results for NSCLC patients. Using cutting-edge methods of feature extraction and transfer learning, we present a drug discovery model for the identification of therapeutic targets in this paper. For the purpose of extracting features from drug and protein sequences, we make use of a hybrid UNet transformer. This makes it possible to extract deep features that address the issue of false alarms. For dimensionality reduction, the modified Rime optimization (MRO) algorithm is used to select the best features among multiples. In addition, we design the deep transfer learning (DTransL) model to boost the drug discovery accuracy for NSCLC patients’ therapeutic targets. Davis, KIBA, and Binding-DB are examples of benchmark datasets that are used to validate the proposed model. Results exhibit that the MRO+DTransL model outflanks existing cutting edge models. On the Davis dataset, the MRO+DTransL model performed better than the LSTM model by 9.742%, achieved an accuracy of 98.398%. It reached 98.264% and 97.344% on the KIBA and Binding-DB datasets, respectively, indicating improvements of 8.608% and 8.957% over baseline models.

  16. f

    DataSheet_1_Performance comparison of TCR-pMHC prediction tools reveals a...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Apr 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdollahi, Sina; Ly, Cedric; Bonn, Stefan; Deng, Lihua; Zhao, Yu; Prinz, Immo (2023). DataSheet_1_Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000937010
    Explore at:
    Dataset updated
    Apr 18, 2023
    Authors
    Abdollahi, Sina; Ly, Cedric; Bonn, Stefan; Deng, Lihua; Zhao, Yu; Prinz, Immo
    Description

    The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.

  17. HUN Comparison of model variability and interannual variability

    • researchdata.edu.au
    • data.gov.au
    Updated Mar 13, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2019). HUN Comparison of model variability and interannual variability [Dataset]. https://researchdata.edu.au/hun-comparison-model-interannual-variability/2987743
    Explore at:
    Dataset updated
    Mar 13, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Bioregional Assessment Program
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    These are the data summarising the modelled Hydrological Response Variable (HRV) variability versus climate interannual variability which has been used as an indicator of risk. For example, to understand the significance of the modelled increases in low-flow days, it is useful to look at them in the context of the interannual variability in low-flow days due to climate. In other words, are the modelled increases due to additional coal resource development within the natural range of variability of the longer-term flow regime, or are they potentially moving the system outside the range of hydrological variability it experiences under the current climate? The maximum increase in the number of low-flow days due to additional coal resource development relative to the interannual variability in low-flow days under the baseline has been adopted to put some context around the modelled changes. If the maximum change is small relative to the interannual variability due to climate (e.g. an increase of 3 days relative to a baseline range of 20 to 50 days), then the risk of impacts from the changes in low-flow days is likely to be low. If the maximum change is comparable to or greater than the interannual variability due to climate (e.g. an increase of 200 days relative to a baseline range of 20 to 50 days), then there is a greater risk of impact on the landscape classes and assets that rely on this water source. Here changes comparable to or greater than interannual variability are interpreted as presenting a risk. However, the change due to the additional coal resource development is additive, so even a 'less than interannual variability' change is not free from risk. Results of the interannual variability comparison should be viewed as indicators of risk.

    Dataset History

    This dataset is generated using 1000 HRV simulations together with climate inputs. Ratios between the variability in HRVs, and the variability attributable interannual variability due to climate, were calculated for the HRVs. Results of the interannual variability comparison should be viewed as indicators of risk.

    Dataset Citation

    Bioregional Assessment Programme (2017) HUN Comparison of model variability and interannual variability. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/1c0a19f9-98c2-4d92-956d-dd764aaa10f9.

    Dataset Ancestors

  18. example datasets

    • figshare.com
    zip
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    soda-inria (2025). example datasets [Dataset]. http://doi.org/10.6084/m9.figshare.28241549.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    soda-inria
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    example dataset

  19. d

    Maryland Counties Match Tool for Data Quality

    • catalog.data.gov
    • opendata.maryland.gov
    • +1more
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata.maryland.gov (2025). Maryland Counties Match Tool for Data Quality [Dataset]. https://catalog.data.gov/dataset/maryland-counties-match-tool-for-data-quality
    Explore at:
    Dataset updated
    Oct 25, 2025
    Dataset provided by
    opendata.maryland.gov
    Area covered
    Maryland
    Description

    Data standardization is an important part of effective management. However, sometimes people have data that doesn't match. This dataset includes different ways that counties might get written by different people. It can be used as a lookup table when you need County to be your unique identifier. For example, it allows you to match St. Mary's, St Marys, and Saint Mary's so that you can use it with disparate data from other data sets.

  20. N

    Brevard, NC Annual Population and Growth Analysis Dataset: A Comprehensive...

    • neilsberg.com
    csv, json
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Brevard, NC Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Brevard from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/brevard-nc-population-by-year/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brevard, North Carolina
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Brevard population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Brevard across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2023, the population of Brevard was 7,847, a 0.35% increase year-by-year from 2022. Previously, in 2022, Brevard population was 7,820, an increase of 0.01% compared to a population of 7,819 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Brevard increased by 884. In this period, the peak population was 7,879 in the year 2018. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2023

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2023)
    • Population: The population for the specific year for the Brevard is shown in this column.
    • Year on Year Change: This column displays the change in Brevard population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Brevard Population by Year. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Argilla (2023). comparison-dataset-dolly-curated-falcon [Dataset]. https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon

comparison-dataset-dolly-curated-falcon

argilla/comparison-dataset-dolly-curated-falcon

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2023
Dataset authored and provided by
Argilla
Description

Guidelines

These guidelines are based on the paper Training Language Models to Follow Instructions with Human Feedback You are given a text-based description of a task, submitted by a user. This task description may be in the form of an explicit instruction (e.g. "Write a story about a wise frog."). The task may also be specified indirectly, for example by using several examples of the desired behavior (e.g. given a sequence of movie reviews followed by their sentiment, followed by… See the full description on the dataset page: https://huggingface.co/datasets/argilla/comparison-dataset-dolly-curated-falcon.

Search
Clear search
Close search
Google apps
Main menu