100+ datasets found

f
Table_1_Comparison of machine learning and logistic regression as predictive...
frontiersin.figshare.com
xlsx
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dongying Zheng; Xinyu Hao; Muhanmmad Khan; Lixia Wang; Fan Li; Ning Xiang; Fuli Kang; Timo Hamalainen; Fengyu Cong; Kedong Song; Chong Qiao (2023). Table_1_Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study.XLSX [Dataset]. http://doi.org/10.3389/fcvm.2022.959649.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fcvm.2022.959649.s003
Dataset updated
Jun 13, 2023
Dataset provided by
Frontiers
Authors
Dongying Zheng; Xinyu Hao; Muhanmmad Khan; Lixia Wang; Fan Li; Ning Xiang; Fuli Kang; Timo Hamalainen; Fengyu Cong; Kedong Song; Chong Qiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionPreeclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. Predictive models based on machine learning algorithms demonstrate promising potential, while there is a controversial discussion about whether machine learning methods should be recommended preferably, compared to traditional statistical models.MethodsWe employed both logistic regression and six machine learning methods as binary predictive models for a dataset containing 733 women diagnosed with preeclampsia. Participants were grouped by four different pregnancy outcomes. After the imputation of missing values, statistical description and comparison were conducted preliminarily to explore the characteristics of documented 73 variables. Sequentially, correlation analysis and feature selection were performed as preprocessing steps to filter contributing variables for developing models. The models were evaluated by multiple criteria.ResultsWe first figured out that the influential variables screened by preprocessing steps did not overlap with those determined by statistical differences. Secondly, the most accurate imputation method is K-Nearest Neighbor, and the imputation process did not affect the performance of the developed models much. Finally, the performance of models was investigated. The random forest classifier, multi-layer perceptron, and support vector machine demonstrated better discriminative power for prediction evaluated by the area under the receiver operating characteristic curve, while the decision tree classifier, random forest, and logistic regression yielded better calibration ability verified, as by the calibration curve.ConclusionMachine learning algorithms can accomplish prediction modeling and demonstrate superior discrimination, while Logistic Regression can be calibrated well. Statistical analysis and machine learning are two scientific domains sharing similar themes. The predictive abilities of such developed models vary according to the characteristics of datasets, which still need larger sample sizes and more influential predictors to accumulate evidence.
g
Inventory of pre-trained learning models of the Etalab AI Lab | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inventory of pre-trained learning models of the Etalab AI Lab | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_621e35dedfc43f0450b33b25/
Explore at:
Description
** Inventory of pre-trained learning models of the Etalab AI Lab** The publication of the inventory of pre-trained machine learning models is part of the roadmap of the Ministry of Transformation and Public Service (see p. 25 of the downloadable document here). This dataset lists the different algorithms trained by the Lab IA to date as part of the development of its shared tools (more information on the dedicated page of the IA Lab). Details of what the inventory contains For each algorithm, the column “link_model_card” provides a link to access a description of the algorithm. We followed the pattern description frame presented in Margaret Mitchell & al’s “Model Cards for Model Reporting” paper (downloadable here). The column “link_depot_github” returns to the GitHub repository containing the code that led to the algorithm. The column “model_entraine_open” has the value “no” if the trained model is not opened and is “yes” if the driven model is opened. In the latter case, the link to the driven model is entered in the column “link_modele_entraine_si_pertinent”. The column “date_last_mise_a_day” indicates the date of last update of the template.
h
process-modeling-dataset
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marwa El Kamil (2025). process-modeling-dataset [Dataset]. https://huggingface.co/datasets/maghwa/process-modeling-dataset
Explore at:
Dataset updated
May 28, 2025
Authors
Marwa El Kamil
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Process Modeling Dataset

This dataset contains pairs of natural language process descriptions and their corresponding POWL (Process-Oriented Workflow Language) code implementations. It is designed for fine-tuning language models to translate informal process descriptions into formal process models.

Dataset Structure

The dataset consists of two splits:

train: Training examples for model fine-tuning validation: Validation examples for monitoring training progress

Each… See the full description on the dataset page: https://huggingface.co/datasets/maghwa/process-modeling-dataset.
h
model-library
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Kluge Corrêa, model-library [Dataset]. https://huggingface.co/datasets/nicholasKluge/model-library
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Nicholas Kluge Corrêa
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Model Library DB

Dataset Summary

The Model Library is a project that maps the risks associated with modern machine learning systems. Here, we assess some of the most recent and capable AI systems ever created. This is the database for the Model Library.

Supported Tasks and Leaderboards

This dataset serves as a catalog of machine learning models, all displayed in the Model Library.

Languages

English.

Dataset Structure Data Instances… See the full description on the dataset page: https://huggingface.co/datasets/nicholasKluge/model-library.
Ecore Metamodels and EcoreBERT Pre-Trained Language Model
zenodo.org
bin
Updated Oct 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Weyssow; Martin Weyssow (2021). Ecore Metamodels and EcoreBERT Pre-Trained Language Model [Dataset]. http://doi.org/10.5281/zenodo.4673453
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4673453
Dataset updated
Oct 20, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Weyssow; Martin Weyssow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains ecore metamodels from the MAR dataset transformed into tree representations. The original dataset can be found here: http://mar-search.org/experiments/models20/

The data contained in this repository were used to conduct the experiments in the paper: Recommending Metamodel Concepts during Modeling Activities with Pre-Trained Language Models. Link to the paper: https://arxiv.org/abs/2104.01642

The data are organized as follows:

model : our model trained on the tree representations of metamodels with RoBERTa architecture.

tokenizers : the byte-level BPE tokenizer we used to train our model.

train : the training data separated into a training and validation set.

test : the test data of all experiments conducted in the paper.

This data repository is linked with the following Github repository containing our code: https://github.com/mweyssow/ecore-bert
h
nepalitext-language-model-dataset
huggingface.co
opendatalab.com
Updated Jan 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Utsav Maskey (2023). nepalitext-language-model-dataset [Dataset]. https://huggingface.co/datasets/Sakonii/nepalitext-language-model-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 31, 2023
Authors
Utsav Maskey
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for "nepalitext-language-model-dataset"

Dataset Summary

"NepaliText" language modeling dataset is a collection of over 13 million Nepali text sequences (phrases/sentences/paragraphs) extracted by combining the datasets: OSCAR , cc100 and a set of scraped Nepali articles on Wikipedia.

Supported Tasks and Leaderboards

This dataset is intended to pre-train language models and word representations on Nepali Language.

Languages

The data is… See the full description on the dataset page: https://huggingface.co/datasets/Sakonii/nepalitext-language-model-dataset.
Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias,...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federica Pepe; Vittoria Nardone; Vittoria Nardone; Antonio Mastropaolo; Antonio Mastropaolo; Gerardo Canfora; Gerardo Canfora; Gabriele BAVOTA; Gabriele BAVOTA; Massimiliano Di Penta; Massimiliano Di Penta; Federica Pepe (2024). Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study" [Dataset]. http://doi.org/10.5281/zenodo.10058142
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10058142
Dataset updated
Jan 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Federica Pepe; Vittoria Nardone; Vittoria Nardone; Antonio Mastropaolo; Antonio Mastropaolo; Gerardo Canfora; Gerardo Canfora; Gabriele BAVOTA; Gabriele BAVOTA; Massimiliano Di Penta; Massimiliano Di Penta; Federica Pepe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This replication package contains datasets and scripts related to the paper: "*How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study*"

## Root directory
- `statistics.r`: R script used to compute the correlation between usage and downloads, and the RQ1/RQ2 inter-rater agreements
- `modelsInfo.zip`: zip file containing all the downloaded model cards (in JSON format)
- `script`: directory containing all the scripts used to collect and process data. For further details, see README file inside the script directory.

## Dataset
- `Dataset/Dataset_HF-models-list.csv`: list of HF models analyzed
- `Dataset/Dataset_github-prj-list.txt`: list of GitHub projects using the *transformers* library
- `Dataset/Dataset_github-Prj_model-Used.csv`: contains usage pairs: project, model
- `Dataset/Dataset_prj-num-models-reused.csv`: number of models used by each GitHub project
- `Dataset/Dataset_model-download_num-prj_correlation.csv` contains, for each model used by GitHub projects: the name, the task, the number of reusing projects, and the number of downloads

## RQ1
- `RQ1/RQ1_dataset-list.txt`: list of HF datasets
- `RQ1/RQ1_datasetSample.csv`: sample set of models used for the manual analysis of datasets
- `RQ1/RQ1_analyzeDatasetTags.py`: Python script to analyze model tags for the presence of datasets. it requires to unzip the `modelsInfo.zip` in a directory with the same name (`modelsInfo`) at the root of the replication package folder. Produces the output to stdout. To redirect in a file fo be analyzed by the `RQ2/countDataset.py` script
- `RQ1/RQ1_countDataset.py`: given the output of `RQ2/analyzeDatasetTags.py` (passed as argument) produces, for each model, a list of Booleans indicating whether (i) the model only declares HF datasets, (ii) the model only declares external datasets, (iii) the model declares both, and (iv) the model is part of the sample for the manual analysis
- `RQ1/RQ1_datasetTags.csv`: output of `RQ2/analyzeDatasetTags.py`
- `RQ1/RQ1_dataset_usage_count.csv`: output of `RQ2/countDataset.py`

## RQ2
- `RQ2/tableBias.pdf`: table detailing the number of occurrences of different types of bias by model Task
- `RQ2/RQ2_bias_classification_sheet.csv`: results of the manual labeling
- `RQ2/RQ2_isBiased.csv`: file to compute the inter-rater agreement of whether or not a model documents Bias
- `RQ2/RQ2_biasAgrLabels.csv`: file to compute the inter-rater agreement related to bias categories
- `RQ2/RQ2_final_bias_categories_with_levels.csv`: for each model in the sample, this file lists (i) the bias leaf category, (ii) the first-level category, and (iii) the intermediate category

## RQ3
- `RQ3/RQ3_LicenseValidation.csv`: manual validation of a sample of licenses
- `RQ3/RQ3_{NETWORK-RESTRICTIVE|RESTRICTIVE|WEAK-RESTRICTIVE|PERMISSIVE}-license-list.txt`: lists of licenses with different permissiveness
- `RQ3/RQ3_prjs_license.csv`: for each project linked to models, among other fields it indicates the license tag and name
- `RQ3/RQ3_models_license.csv`: for each model, indicates among other pieces of info, whether the model has a license, and if yes what kind of license
- `RQ3/RQ3_model-prj-license_contingency_table.csv`: usage contingency table between projects' licenses (columns) and models' licenses (rows)
- `RQ3/RQ3_models_prjs_licenses_with_type.csv`: pairs project-model, with their respective licenses and permissiveness level

## scripts
Contains the scripts used to mine Hugging Face and GitHub. Details are in the enclosed README
Code and data for publication "pyGRETA, pyCLARA, pyPRIMA: A pre-processing...
zenodo.org
explore.openaire.eu
pdf, zip
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kais Siala; Kais Siala; Leonhard Odersky; Leonhard Odersky (2024). Code and data for publication "pyGRETA, pyCLARA, pyPRIMA: A pre-processing suite to generate flexible model regions for energy system models" [Dataset]. http://doi.org/10.5281/zenodo.5054420
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5054420
Dataset updated
Jul 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kais Siala; Kais Siala; Leonhard Odersky; Leonhard Odersky
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the code of the three pre-processing tools pyGRETA, pyPRIMA and pyCLARA and an examplary database for the scope of Austria.

To run the code with full functionality additional data is needed. Check the documentation of the tools for further information.

Sources for data can be found here:

pyGRETA: https://pygreta.readthedocs.io/en/stable/user_manual.html#recommended-input-sources

pyPRIMA: https://pyprima.readthedocs.io/en/stable/user_manual.html#recommended-input-sources

pyCLARA: https://pyclara.readthedocs.io/en/stable/user_manual.html#recommended-input-sources
h
TARA
huggingface.co
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ernie-research (2025). TARA [Dataset]. https://huggingface.co/datasets/ernie-research/TARA
Explore at:
Dataset updated
Jun 6, 2025
Dataset authored and provided by
ernie-research
Description
Dataset Card for TARA

Dataset Summary

TARA is a novel Tool-Augmented Reward modeling datAset that includes comprehensive comparison data of human preferences and detailed tool invocation processes. It was introduced in this paper and was used to train Themis-7b.

Supported Tools

TARA supports multiple tools including Calculator, Code, Translator, Google Search, Calendar, Weather, WikiSearch and Multi-tools.

Dataset Structure

calculator: preference… See the full description on the dataset page: https://huggingface.co/datasets/ernie-research/TARA.
f
Baselines ML models.
plos.figshare.com
xls
Updated Jun 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marzieh Mozafari; Khouloud Mnassri; Reza Farahbakhsh; Noel Crespi (2024). Baselines ML models. [Dataset]. http://doi.org/10.1371/journal.pone.0304166.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304166.t003
Dataset updated
Jun 21, 2024
Dataset provided by
PLOS ONE
Authors
Marzieh Mozafari; Khouloud Mnassri; Reza Farahbakhsh; Noel Crespi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS.Different types of abusive content such as offensive language, hate speech, aggression, etc. have become prevalent in social media and many efforts have been dedicated to automatically detect this phenomenon in different resource-rich languages such as English. This is mainly due to the comparative lack of annotated data related to offensive language in low-resource languages, especially the ones spoken in Asian countries. To reduce the vulnerability among social media users from these regions, it is crucial to address the problem of offensive language in such low-resource languages. Hence, we present a new corpus of Persian offensive language consisting of 6,000 out of 520,000 randomly sampled micro-blog posts from X (Twitter) to deal with offensive language detection in Persian as a low-resource language in this area. We introduce a method for creating the corpus and annotating it according to the annotation practices of recent efforts for some benchmark datasets in other languages which results in categorizing offensive language and the target of offense as well. We perform extensive experiments with three classifiers in different levels of annotation with a number of classical Machine Learning (ML), Deep learning (DL), and transformer-based neural networks including monolingual and multilingual pre-trained language models. Furthermore, we propose an ensemble model integrating the aforementioned models to boost the performance of our offensive language detection task. Initial results on single models indicate that SVM trained on character or word n-grams are the best performing models accompanying monolingual transformer-based pre-trained language model ParsBERT in identifying offensive vs non-offensive content, targeted vs untargeted offense, and offensive towards individual or group. In addition, the stacking ensemble model outperforms the single models by a substantial margin, obtaining 5% respective macro F1-score improvement for three levels of annotation.
o
171k product review with Sentiment Dataset
opendatabay.com
.undefined
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). 171k product review with Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/28775c3e-a835-4f3c-a0fb-06360defabcf
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
The dataset contains product reviews along with corresponding prices, names, review, summary and sentiment labels. The sentiment labels indicate whether the review expresses a positive, negative, or neutral sentiment towards the product. Based on the provided dataset, a possible application could be sentiment analysis of product reviews. This could involve using machine learning algorithms to automatically classify reviews as positive, negative, or neutral based on the textual content of the review and associated metadata such as the product name and price. Such a system could be used by businesses to track customer sentiment towards their products and identify areas for improvement. It could also be used by consumers to make more informed purchasing decisions based on the experiences of others.

License

CC0

Original Data Source: 171k product review with Sentiment Dataset
t
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point...
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/point-bert--pre-training-3d-point-cloud-transformers-with-masked-point-modeling
Explore at:
Dataset updated
Dec 16, 2024
Description
Point-BERT is a new paradigm for learning point cloud Transformers. It pre-trains standard point cloud Transformers with a Masked Point Modeling (MPM) task.
f
Datasets used in the study.
figshare.com
xls
Updated Dec 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erik Bergman; Luise Dürlich; Veronica Arthurson; Anders Sundström; Maria Larsson; Shamima Bhuiyan; Andreas Jakobsson; Gabriel Westman (2023). Datasets used in the study. [Dataset]. http://doi.org/10.1371/journal.pdig.0000409.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000409.t001
Dataset updated
Dec 6, 2023
Dataset provided by
PLOS Digital Health
Authors
Erik Bergman; Luise Dürlich; Veronica Arthurson; Anders Sundström; Maria Larsson; Shamima Bhuiyan; Andreas Jakobsson; Gabriel Westman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Post-marketing reports of suspected adverse drug reactions are important for establishing the safety profile of a medicinal product. However, a high influx of reports poses a challenge for regulatory authorities as a delay in identification of previously unknown adverse drug reactions can potentially be harmful to patients. In this study, we use natural language processing (NLP) to predict whether a report is of serious nature based solely on the free-text fields and adverse event terms in the report, potentially allowing reports mislabelled at time of reporting to be detected and prioritized for assessment. We consider four different NLP models at various levels of complexity, bootstrap their train-validation data split to eliminate random effects in the performance estimates and conduct prospective testing to avoid the risk of data leakage. Using a Swedish BERT based language model, continued language pre-training and final classification training, we achieve close to human-level performance in this task. Model architectures based on less complex technical foundation such as bag-of-words approaches and LSTM neural networks trained with random initiation of weights appear to perform less well, likely due to the lack of robustness that a base of general language training provides.
ProGAN pretrained models
kaggle.com
zip
Updated Nov 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
lmyybh (2021). ProGAN pretrained models [Dataset]. https://www.kaggle.com/lmyybh/progan-pretrained-models
Explore at:
zip(351991445 bytes)Available download formats
Dataset updated
Nov 7, 2021
Authors
lmyybh
Description
Dataset

This dataset was created by lmyybh

Contents
Z
Data from: An Empirical Comparison of Pre-Trained Models of Source Code
data.niaid.nih.gov
zenodo.org
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2023). An Empirical Comparison of Pre-Trained Models of Source Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7318109
Explore at:
Dataset updated
Jun 7, 2023
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The replication package of the paper "An Empirical Comparison of Pre-Trained Models of Source Code". For the source code, please refer to https://github.com/NougatCA/FineTuner.
h
lener_br_finetuning_language_model
huggingface.co
Updated Dec 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierre Guillou (2021). lener_br_finetuning_language_model [Dataset]. https://huggingface.co/datasets/pierreguillou/lener_br_finetuning_language_model
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 29, 2021
Authors
Pierre Guillou
Description
Dataset Card for "LeNER-Br language modeling"

Dataset Summary

The LeNER-Br language modeling dataset is a collection of legal texts in Portuguese from the LeNER-Br dataset (official site). The legal texts were downloaded from this link (93.6MB) and processed to create a DatasetDict with train and validation dataset (20%). The LeNER-Br language modeling dataset allows the finetuning of language models as BERTimbau base and large.

Language

Portuguese from… See the full description on the dataset page: https://huggingface.co/datasets/pierreguillou/lener_br_finetuning_language_model.
pixmo-ask-model-anything
huggingface.co
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2025). pixmo-ask-model-anything [Dataset]. https://huggingface.co/datasets/allenai/pixmo-ask-model-anything
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2025
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
PixMo-AskModelAnything

PixMo-AskModelAnything is an instruction-tuning dataset for vision-language models. It contains human-authored question-answer pairs about diverse images with long-form answers. PixMo-AskModelAnything is a part of the PixMo dataset collection and was used to train the Molmo family of models Quick links:

📃 Paper 🎥 Blog with Videos

Loading

data = datasets.load_dataset("allenai/pixmo-ask-model-anything", split="train")

Data Format… See the full description on the dataset page: https://huggingface.co/datasets/allenai/pixmo-ask-model-anything.
Dataset for pre-training language models for Java, C, and C++ obtained from...
zenodo.org
zip
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miroslaw Ochodek; Miroslaw Ochodek (2025). Dataset for pre-training language models for Java, C, and C++ obtained from GitHub [Dataset]. http://doi.org/10.5281/zenodo.15041466
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15041466
Dataset updated
Mar 17, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Miroslaw Ochodek; Miroslaw Ochodek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset was composed within the research project - NCN OPUS'21 project "Source-code-representations for machine-learning-based identification of defective code fragments" (2021/41/B/ST6/02510)] (https://ml4code.cs.put.poznan.pl/).
h
model_card_dataset_mentions
huggingface.co
Updated Jul 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Librarian Bots (2023). model_card_dataset_mentions [Dataset]. https://huggingface.co/datasets/librarian-bots/model_card_dataset_mentions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 12, 2023
Dataset authored and provided by
Librarian Bots
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Model Card Dataset Mentions

Dataset Summary

This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/librarian-bots/model_card_dataset_mentions.
R
Fire Models Dataset
universe.roboflow.com
zip
Updated May 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NICO (2025). Fire Models Dataset [Dataset]. https://universe.roboflow.com/nico-mj3bd/fire-models-yrko6/model/1
Explore at:
zipAvailable download formats
Dataset updated
May 24, 2025
Dataset authored and provided by
NICO
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
FIRE Bounding Boxes
Description
FIRE MODELS

## Overview FIRE MODELS is a dataset for object detection tasks - it contains FIRE annotations for 201 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).

Facebook

Twitter

Click to copy link

Link copied

Cite

Dongying Zheng; Xinyu Hao; Muhanmmad Khan; Lixia Wang; Fan Li; Ning Xiang; Fuli Kang; Timo Hamalainen; Fengyu Cong; Kedong Song; Chong Qiao (2023). Table_1_Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study.XLSX [Dataset]. http://doi.org/10.3389/fcvm.2022.959649.s003

Table_1_Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study.XLSX

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.3389/fcvm.2022.959649.s003

Dataset updated

Jun 13, 2023

Dataset provided by

Frontiers

Authors

Dongying Zheng; Xinyu Hao; Muhanmmad Khan; Lixia Wang; Fan Li; Ning Xiang; Fuli Kang; Timo Hamalainen; Fengyu Cong; Kedong Song; Chong Qiao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionPreeclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. Predictive models based on machine learning algorithms demonstrate promising potential, while there is a controversial discussion about whether machine learning methods should be recommended preferably, compared to traditional statistical models.MethodsWe employed both logistic regression and six machine learning methods as binary predictive models for a dataset containing 733 women diagnosed with preeclampsia. Participants were grouped by four different pregnancy outcomes. After the imputation of missing values, statistical description and comparison were conducted preliminarily to explore the characteristics of documented 73 variables. Sequentially, correlation analysis and feature selection were performed as preprocessing steps to filter contributing variables for developing models. The models were evaluated by multiple criteria.ResultsWe first figured out that the influential variables screened by preprocessing steps did not overlap with those determined by statistical differences. Secondly, the most accurate imputation method is K-Nearest Neighbor, and the imputation process did not affect the performance of the developed models much. Finally, the performance of models was investigated. The random forest classifier, multi-layer perceptron, and support vector machine demonstrated better discriminative power for prediction evaluated by the area under the receiver operating characteristic curve, while the decision tree classifier, random forest, and logistic regression yielded better calibration ability verified, as by the calibration curve.ConclusionMachine learning algorithms can accomplish prediction modeling and demonstrate superior discrimination, while Logistic Regression can be calibrated well. Statistical analysis and machine learning are two scientific domains sharing similar themes. The predictive abilities of such developed models vary according to the characteristics of datasets, which still need larger sample sizes and more influential predictors to accumulate evidence.

Clear search

Close search

Google apps

Main menu

Table_1_Comparison of machine learning and logistic regression as predictive...

Inventory of pre-trained learning models of the Etalab AI Lab | gimi9.com

process-modeling-dataset

model-library

Ecore Metamodels and EcoreBERT Pre-Trained Language Model

nepalitext-language-model-dataset

Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias,...

Code and data for publication "pyGRETA, pyCLARA, pyPRIMA: A pre-processing...

TARA

Baselines ML models.

171k product review with Sentiment Dataset

License

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point...

Datasets used in the study.

ProGAN pretrained models

Dataset

Contents

Data from: An Empirical Comparison of Pre-Trained Models of Source Code

lener_br_finetuning_language_model

pixmo-ask-model-anything

Dataset for pre-training language models for Java, C, and C++ obtained from...

model_card_dataset_mentions

Fire Models Dataset

FIRE MODELS

Table_1_Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study.XLSXSee More Versions

Table_1_Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study.XLSX