100+ datasets found

Z
Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...
data.niaid.nih.gov
dataverse.harvard.edu
+1more
Updated Aug 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6624080
Explore at:
Dataset updated
Aug 10, 2022
Dataset authored and provided by
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset:

N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Journal of Data, vol. 7, no. 8, p. 109, Aug. 2022, doi: 10.3390/data7080109

Abstract

The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description

The dataset comprises a total of 52,984 Tweet IDs (that correspond to the same number of Tweets) about online learning that were posted on Twitter from 9th November 2021 to 13th July 2022. The earliest date was selected as 9th November 2021, as the Omicron variant was detected for the first time in a sample that was collected on this date. 13th July 2022 was the most recent date as per the time of data collection and publication of this dataset.

The dataset consists of 9 .txt files. An overview of these dataset files along with the number of Tweet IDs and the date range of the associated tweets is as follows. Table 1 shows the list of all the synonyms or terms that were used for the dataset development.

Filename: TweetIDs_November_2021.txt (No. of Tweet IDs: 1283, Date Range of the associated Tweet IDs: November 1, 2021 to November 30, 2021)

Filename: TweetIDs_December_2021.txt (No. of Tweet IDs: 10545, Date Range of the associated Tweet IDs: December 1, 2021 to December 31, 2021)

Filename: TweetIDs_January_2022.txt (No. of Tweet IDs: 23078, Date Range of the associated Tweet IDs: January 1, 2022 to January 31, 2022)

Filename: TweetIDs_February_2022.txt (No. of Tweet IDs: 4751, Date Range of the associated Tweet IDs: February 1, 2022 to February 28, 2022)

Filename: TweetIDs_March_2022.txt (No. of Tweet IDs: 3434, Date Range of the associated Tweet IDs: March 1, 2022 to March 31, 2022)

Filename: TweetIDs_April_2022.txt (No. of Tweet IDs: 3355, Date Range of the associated Tweet IDs: April 1, 2022 to April 30, 2022)

Filename: TweetIDs_May_2022.txt (No. of Tweet IDs: 3120, Date Range of the associated Tweet IDs: May 1, 2022 to May 31, 2022)

Filename: TweetIDs_June_2022.txt (No. of Tweet IDs: 2361, Date Range of the associated Tweet IDs: June 1, 2022 to June 30, 2022)

Filename: TweetIDs_July_2022.txt (No. of Tweet IDs: 1057, Date Range of the associated Tweet IDs: July 1, 2022 to July 13, 2022)

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development

Terminology

List of synonyms and terms

COVID-19

Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus

online learning

online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures
g
Data from: Uncover-ML: a machine learning pipeline for geoscience data...
ecat.ga.gov.au
Updated Nov 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Uncover-ML: a machine learning pipeline for geoscience data analysis. [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/search?keyword=data%20analytics
Explore at:
Dataset updated
Nov 29, 2021
Description
The geosciences are a data-rich domain where Earth materials and processes are analysed from local to global scales. However, often we only have discrete measurements at specific locations, and a limited understanding of how these features vary across the landscape. Earth system processes are inherently complex, and trans-disciplinary science will likely become increasingly important in finding solutions to future challenges associated with the environment, mineral/petroleum resources and food security. Machine learning is an important approach to synthesise the increasing complexity and sheer volume of Earth science data, and is now widely used in prediction across many scientific disciplines. In this context, we have built a machine learning pipeline, called Uncover-ML, for both supervised and unsupervised learning, prediction and classification. The Uncover-ML pipeline was developed from a partnership between CSIRO and Geoscience Australia, and is largely built around the Python scikit-learn machine learning libraries. In this paper, we briefly describe the architecture and components of Uncover-ML for feature extraction, data scaling, sample selection, predictive mapping, estimating model performance, model optimisation and estimating model uncertainties. Links to download the source code and information on how to implement the algorithms are also provided. Citation: Wilford, J., Basak, S., Hassan, R., Moushall, B., McCalman, L., Steinberg, D. and Zhang, F, 2020. Uncover-ML: a machine learning pipeline for geoscience data analysis. In: Czarnota, K., Roach, I., Abbott, S., Haynes, M., Kositcin, N., Ray, A. and Slatter, E. (eds.) Exploring for the Future: Extended Abstracts, Geoscience Australia, Canberra, 1–4.
Data from: Occupations on the map: Using a super learner algorithm to...
zenodo.org
data.niaid.nih.gov
zip
Updated Dec 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michiel van Dijk; Michiel van Dijk; Thijs de Lange; Thijs de Lange; Paul van Leeuwen; Paul van Leeuwen; Philippe Debie; Philippe Debie (2022). Occupations on the map: Using a super learner algorithm to downscale labor statistics, data [Dataset]. http://doi.org/10.5281/zenodo.6419273
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6419273
Dataset updated
Dec 8, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michiel van Dijk; Michiel van Dijk; Thijs de Lange; Thijs de Lange; Paul van Leeuwen; Paul van Leeuwen; Philippe Debie; Philippe Debie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains all the input and output data (including maps) related to Van Dijk et al. (2022), Occupations on the map: Using a super learner algorithm to downscale labor statistics. It does not contain several large (> 4GB) intermediate files, which summarize the results of the large number of machine learning models that were trained and tuned as part of the super learner algorithm. These files can be created by running the scripts in the supplementary GitHub repository: https://github.com/michielvandijk/occupations_on_the_map. All input and output maps produced as part of this study can also be accessed by means of an interactive web application: https://shiny.wur.nl/occupation-map-vnm.

In this paper, we demonstrated an approach to create fine-scale gridded occupation maps by means of downscaling district-level labor statistics informed by remote sensing and other spatial information. We applied a super-learner algorithm that combined the results of different machine learning models to predict the shares of six major occupation categories and the labor force participation rate at a resolution of 30 arc seconds (~1x1 km) in Vietnam. The results were subsequently combined with gridded information on the working-age population to produce maps of the number of workers per occupation. The proposed approach can also be applied to produce maps of other (labor) statistics, which are only available at aggregated levels.
Z
Dataset used in Design Analytics for Mobile Learning: Scaling up...
data.niaid.nih.gov
zenodo.org
Updated Mar 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerti (2022). Dataset used in Design Analytics for Mobile Learning: Scaling up theClassification of Learning Designs based onCognitive and Contextual Elements [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6320367
Explore at:
Dataset updated
Mar 1, 2022
Dataset authored and provided by
Gerti
Description
The following dataset has been used for the paper entitled "Design Analytics for Mobile Learning: Scaling up theClassification of Learning Designs based onCognitive and Contextual Elements".

Abstract

This research was triggered by the identified need in literature for large-scale studies about the kind of designs that teachers create for Mobile Learning (m-learning). These studies require analyses of large datasets of learning designs. The common approach followed by researchers when analysing designs has been to manually classify them following high-level pedagogically-guided coding strategies, which demands extensive work. Therefore, the first goal of this paper is to explore the use of Supervised Machine Learning (SML) to automatically classify the textual content of m-learning designs, through pedagogically-relevant classifications, such as the cognitive level demanded by students to carry out specific designed tasks, the phases of inquiry learning represented in the designs, or the role that the situated environment has in them. As not all the SML models are transparent, while often researchers need to understand the behaviour behind them, the second goal of this paper considers the trade-off between models’ performance and interpretability in the context of design analytics for m-learning. To achieve these goals we compiled a dataset of designs deployed through two tools, Avastusrada and Smartzoos. With it, we trained and compared different models and feature extraction techniques. We further optimized andcompared the best-performing and most interpretable algorithms (EstBERT and Logistic Regression) to consider the second goal through an illustrative case. We found that SML can reliably classify designs, with accuracy>0.86and Cohen’s kappa>0.69.
P
Data from: MNIST Large Scale dataset Dataset
paperswithcode.com
Updated Jun 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ylva Jansson; Tony Lindeberg (2021). MNIST Large Scale dataset Dataset [Dataset]. https://paperswithcode.com/dataset/mnist-large-scale-dataset
Explore at:
Dataset updated
Jun 10, 2021
Authors
Ylva Jansson; Tony Lindeberg
Description
The MNIST Large Scale dataset is based on the classic MNIST dataset, but contains large scale variations up to a factor of 16. The motivation behind creating this dataset was to enable testing the ability of different algorithms to learn in the presence of large scale variability and specifically the ability to generalise to new scales not present in the training set over wide scale ranges.

The dataset contains training data for each one of the relative size factors 1, 2 and 4 relative to the original MNIST dataset and testing data for relative scaling factors between 1/2 and 8, with a ratio of $\sqrt[4]{2}$ between adjacent scales.
Data from: Industry-scale Application and Evaluation of Deep Learning for...
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Apr 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noé Sturm; Andreas Mayr; Thanh Le Van; Vladimir Chupakhin; Vladimir Chupakhin; Hugo Ceulemans; Joerg Wegner; Jose-Felipe Golib-Dzib; Nina Jeliazkova; Nina Jeliazkova; Yves Vandriessche; Stanislav Bohm; Vojtech Cima; Jan Martinovic; Nigel Greene; Tom Vander Aa; Thomas J. Ashby; Sepp Hochreiter; Ola Engkvist; Günter Klambauer; Hongming Chen; Noé Sturm; Andreas Mayr; Thanh Le Van; Hugo Ceulemans; Joerg Wegner; Jose-Felipe Golib-Dzib; Yves Vandriessche; Stanislav Bohm; Vojtech Cima; Jan Martinovic; Nigel Greene; Tom Vander Aa; Thomas J. Ashby; Sepp Hochreiter; Ola Engkvist; Günter Klambauer; Hongming Chen (2020). Industry-scale Application and Evaluation of Deep Learning for Drug Target Prediction [Dataset]. http://doi.org/10.5281/zenodo.3239499
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3239499
Dataset updated
Apr 21, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Noé Sturm; Andreas Mayr; Thanh Le Van; Vladimir Chupakhin; Vladimir Chupakhin; Hugo Ceulemans; Joerg Wegner; Jose-Felipe Golib-Dzib; Nina Jeliazkova; Nina Jeliazkova; Yves Vandriessche; Stanislav Bohm; Vojtech Cima; Jan Martinovic; Nigel Greene; Tom Vander Aa; Thomas J. Ashby; Sepp Hochreiter; Ola Engkvist; Günter Klambauer; Hongming Chen; Noé Sturm; Andreas Mayr; Thanh Le Van; Hugo Ceulemans; Joerg Wegner; Jose-Felipe Golib-Dzib; Yves Vandriessche; Stanislav Bohm; Vojtech Cima; Jan Martinovic; Nigel Greene; Tom Vander Aa; Thomas J. Ashby; Sepp Hochreiter; Ola Engkvist; Günter Klambauer; Hongming Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.
Dataset of factors impacting second language learning from teachers'...
data.niaid.nih.gov
produccioncientifica.usal.es
+2more
zip
Updated Mar 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto Sánchez-Cabrero; Amaya Arigita-García (2021). Dataset of factors impacting second language learning from teachers' experience [Dataset]. http://doi.org/10.5061/dryad.zcrjdfnb4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.zcrjdfnb4
Dataset updated
Mar 22, 2021
Dataset provided by
Universidad Alfonso X el Sabio
Authors
Roberto Sánchez-Cabrero; Amaya Arigita-García
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Developing an accurate second language competence different from the mother tongue has become an essential skill in today's globalized world, and therefore it is a highly valued and demanded learning among the main educational institutions and models. However, it is a complex process that is influenced by numerous and, in many cases, unknown factors, which are not usually taken into consideration when designing second language learning processes, which tend to lead to inadequate teaching and may lead to school failure that could have been avoided.

216 in-service teachers from all non-university educational stages of the Community of Madrid, Spain, evaluate the significance of 44 factors traditionally associated with second language learning, which are grouped into four general categories (factors linked to students; factors linked to teachers; learning structure and organisation; and learning environment) through a five-point Likert scale.

The data were collected using a Google Forms questionnaire through the research described in Arigita-García et al. (2021). The sample is heterogeneous concerning different attribute variables such as age, teaching experience, gender, school ownership, and the language in which classes are taught. The sample was obtained through social networks and teacher forums.

The data collection offers essential information to better understand the process of second language learning, as it gathers the experience and learning accumulated by the teachers who took part in this work, which implies direct information from the educational reality that they are intended to improve.

Methods Computerized questionnaire through the Google Forms private server
f
Accuracy (%) of different algorithms for overall nutritional status.
plos.figshare.com
xls
Updated May 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Najma Begum; Mohd. Muzibur Rahman; Mohammad Omar Faruk (2024). Accuracy (%) of different algorithms for overall nutritional status. [Dataset]. http://doi.org/10.1371/journal.pone.0304389.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304389.t004
Dataset updated
May 31, 2024
Dataset provided by
PLOS ONE
Authors
Najma Begum; Mohd. Muzibur Rahman; Mohammad Omar Faruk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accuracy (%) of different algorithms for overall nutritional status.
f
Specifications of machine learning algorithms.
plos.figshare.com
xls
Updated May 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Najma Begum; Mohd. Muzibur Rahman; Mohammad Omar Faruk (2024). Specifications of machine learning algorithms. [Dataset]. http://doi.org/10.1371/journal.pone.0304389.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304389.t003
Dataset updated
May 31, 2024
Dataset provided by
PLOS ONE
Authors
Najma Begum; Mohd. Muzibur Rahman; Mohammad Omar Faruk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AimMalnutrition in pregnant women significantly affects both mother and child health. This research aims to identify the best machine learning (ML) techniques for predicting the nutritional status of pregnant women in Bangladesh and detect the most essential features based on the best-performed algorithm.MethodsThis study used retrospective cross-sectional data from the Bangladeshi Demographic and Health Survey 2017–18. Different feature transformations and machine learning classifiers were applied to find the best transformation and classification model.ResultsThis investigation found that robust scaling outperformed all feature transformation methods. The result shows that the Random Forest algorithm with robust scaling outperforms all other machine learning algorithms with 74.75% accuracy, 57.91% kappa statistics, 73.36% precision, 73.08% recall, and 73.09% f1 score. In addition, the Random Forest algorithm had the highest precision (76.76%) and f1 score (71.71%) for predicting the underweight class, as well as an expected precision of 82.01% and f1 score of 83.78% for the overweight/obese class when compared to other algorithms with a robust scaling method. The respondent’s age, wealth index, region, husband’s education level, husband’s age, and occupation were crucial features for predicting the nutritional status of pregnant women in Bangladesh.ConclusionThe proposed classifier could help predict the expected outcome and reduce the burden of malnutrition among pregnant women in Bangladesh.
Informativeness, contingency and time scale invariance in associative...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Balsam; Eleanor H. Simpson; Charles R. Gallistel (2024). Informativeness, contingency and time scale invariance in associative learning [Dataset]. http://doi.org/10.5061/dryad.3xsj3txq8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.3xsj3txq8
Dataset updated
Jul 16, 2024
Dataset provided by
Columbia University Irving Medical Center
Barnard College
Rutgers, The State University of New Jersey
Authors
Peter Balsam; Eleanor H. Simpson; Charles R. Gallistel
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Contemporary theories guiding the search for neural mechanisms of learning and memory assume that associative learning results from the temporal pairing of cues and reinforcers resulting in coincident activation of associated neurons, strengthening their synaptic connection. While enduring, this framework has limitations: Temporal-pairing-based models of learning do not fit with many experimental observations and cannot be used to make quantitative predictions about behavior. Here we present behavioral data that supports an alternative, information-theoretic conception: The amount of information that cues provide about the timing of reward delivery predicts behavior. Furthermore, this approach accounts for the rate and depth of both inhibitory and excitatory learning across paradigms and species. We also show that dopamine release in the ventral striatum reflects cue–predicted changes in reinforcement rates consistent with subjects understanding temporal relationships between task events. Our results reshape the conceptual and biological framework for understanding associative learning. Methods Complete Methods are provided in the article: Learning Depends on the Information Conveyed by Temporal Relationships Between Events and is Reflected in the Dopamine Response to Cues. Briefly: Male, Sprague-Dawley rats were housed in groups of two in a colony room on a 12:12 hour light:dark cycle. Water was available ad-lib in the home cages. They were fed in their home cages for one hour after experimental sessions, which occurred 5 days per week. On weekends, they had ad-lib access to food until approximately 22 hours before the first weekday session. They were approximately two months old at the start of training and had been handled for one week before that. They were trained in eight identical experimental chambers (30.5 cm x 24.1 cm x 21.0 cm) located in ventilated and soundproof boxes. Each chamber was equipped with a speaker, a house light, and a pellet dispenser (Model ENV-203, Med Associates), which delivered 20mg pellets into a head-entry-detecting trough (Models ENV-200R7 and ENV-254-CB, Med Associates). They initially received 2 sessions of magazine training, during which 40 pellets were delivered at random times during a 20-minute session (random time 30s schedule), followed by daily sessions with one of the experimental protocols. The time of occurrence of each head entry and the time of onset and termination of all stimulus events were recorded with 0.1s resolution.
m
Data from: Incorporating long-range physics in atomic-scale machine learning...
archive.materialscloud.org
materialscloud-archive-failover.cineca.it
Updated Dec 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Materials Cloud (2019). Incorporating long-range physics in atomic-scale machine learning [Dataset]. http://doi.org/10.24435/materialscloud:2019.0090/v1
Explore at:
Unique identifier
https://doi.org/10.24435/materialscloud:2019.0090/v1
Dataset updated
Dec 18, 2019
Dataset provided by
Materials Cloud
Description
The most successful and popular machine learning models of atomic-scale properties derive their transferability from a locality ansatz. The properties of a large molecule or a bulk material are written as a sum over contributions that depend on the configurations within finite atom-centered environments. The obvious downside of this approach is that it cannot capture nonlocal, nonadditive effects such as those arising due to long-range electrostatics or quantum interference. We propose a solution to this problem by introducing nonlocal representations of the system, which are remapped as feature vectors that are defined locally and are equivariant in O(3). We consider, in particular, one form that has the same asymptotic behavior as the electrostatic potential. We demonstrate that this framework can capture nonlocal, long-range physics by building a model for the electrostatic energy of randomly distributed point-charges, for the unrelaxed binding curves of charged organic molecular dimers, and for the electronic dielectric response of liquid water. By combining a representation of the system that is sensitive to long-range correlations with the transferability of an atom-centered additive model, this method outperforms current state-of-the-art machine-learning schemes and provides a conceptual framework to incorporate nonlocal physics into atomistic machine learning.
Z
Data from: Impact of Interval Censoring on Data Accuracy and Machine...
data.niaid.nih.gov
zenodo.org
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doffini, Vanni (2024). Impact of Interval Censoring on Data Accuracy and Machine Learning Performance in Biological High-Throughput Screening [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13840800
Explore at:
Dataset updated
Sep 25, 2024
Dataset provided by
Doffini, Vanni
Nash, Michael
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

Data and Results used in the publication entitled "Impact of Interval Censoring on Data Accuracy and Machine Learning Performance in Biological High-Throughput Screening"

Data

This folder contains the raw data used during this work.

EvoEF.csv contains information on the library used (sequences, number of mutations, etc.) and the fitness (energy) used as continuous mean values. mut.csv contains the information about the combinatorial scaling (N vs N_norm), the number of mutations (m) and the probability of each variant using different distributions (uniform and binomial) at different $p_{WT}$.

For further details on how the fitness values were calculated and how the combinatorial scale works, please refer to our prevoius Paper.

Results

This folder contains the results (outputs) of all scripts used. Such results are included in the form of .npy and .npz files. To load such files with numpy you should include the option allow_pickle=True.
Precision (%) of different algorithms for the underweight and...
plos.figshare.com
xls
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Precision (%) of different algorithms for the underweight and overweight/obese. [Dataset]. https://plos.figshare.com/articles/dataset/Precision_of_different_algorithms_for_the_underweight_and_overweight_obese_/25945674
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304389.t010
Dataset updated
May 31, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Najma Begum; Mohd. Muzibur Rahman; Mohammad Omar Faruk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Precision (%) of different algorithms for the underweight and overweight/obese.
w
Books called Stochastic optimization for large-scale machine learning
workwithdata.com
Updated May 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Books called Stochastic optimization for large-scale machine learning [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Stochastic+optimization+for+large-scale+machine+learning
Explore at:
Dataset updated
May 30, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books and is filtered where the book is Stochastic optimization for large-scale machine learning, featuring 7 columns including author, BNB id, book, book publisher, and ISBN. The preview is ordered by publication date (descending).
o
Data from: A Bayesian Nonlinear Mixed-Effects Location Scale Model for...
osf.io
Updated Mar 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donald R. Williams; Philippe Rast (2019). A Bayesian Nonlinear Mixed-Effects Location Scale Model for Learning [Dataset]. https://osf.io/k3rsz
Explore at:
Dataset updated
Mar 25, 2019
Dataset provided by
Center For Open Science
Authors
Donald R. Williams; Philippe Rast
Description
We present a Bayesian nonlinear mixed-effects location scale model (NL-MELSM). The NL-MELSM allows for fitting nonlinear functions to the location, or individual means, and the scale, or within-person variance. Specifically, in the context of learning, this model allows the within-person variance to follow a nonlinear trajectory, where it can be determined whether variability reduces while in the process learning. It incorporates a sub-model that can predict nonlinear parameters for the location and/or scale. This specification estimates random effects for all nonlinear location and scale parameters that are drawn from a common multivariate distribution. This allows estimation of covariances among the random effects, within and across the location and the scale. These covariances offer new insights into the interplay between individual mean structures and intra-individual variability in nonlinear parameters. We take a fully Bayesian approach, not only for ease of estimation, but also because it provides the necessary and consistent information for use in psychological applications, such as model selection and hypothesis testing. To illustrate the model, we use data from 333 individuals, consisting of three age groups, who participated in five learning trials that assessed verbal memory. In an exploratory context we demonstrate that fitting a nonlinear function to the within-person variance, and allowing for individual variation therein, improves predictive accuracy compared to customary modeling techniques (e.g., assuming constant variance). We conclude by discussing the usefulness, limitations, and future directions of the NL-MELSM.
g
Data from: BuildingsBench: A Large-Scale Dataset of 900K Buildings and...
gimi9.com
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting [Dataset]. https://www.gimi9.com/dataset/data-gov_buildingsbench-a-large-scale-dataset-of-900k-buildings-and-benchmark-for-short-term-load-f/
Explore at:
Dataset updated
Dec 4, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The BuildingsBench datasets consist of: Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock. 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF. Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB). BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below: ElectricityLoadDiagrams20112014 Building Data Genome Project-2 Individual household electric power consumption (Sceaux)
u
Q-Herilearn Scale data
portaldelaciencia.uva.es
scidb.cn
+1more
Updated 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Q-Herilearn Scale data [Dataset]. https://portaldelaciencia.uva.es/documentos/668fc414b9e7c03b01bd3eb7
Explore at:
Dataset updated
2023
Authors
Olaia Fontal Merillas; Arias, Victor B.; Arias, Benito; Olaia Fontal Merillas; Arias, Victor B.; Arias, Benito
Description
The Q-Herilearn scale is a probabilistic scale of summative estimates that measures different aspects of the learning process in Heritage Education. It consists of seven factors (Knowing, Understanding, Respecting, Valuing, Caring, Enjoying and Transmitting). Each dimension is measured by means of seven indicators scored on a 4-point frequency response scale (1 = Never or almost never; 2 = Sometimes; 3 = Quite often; 4 = Always or almost always). Sufficient evidence of content validity has been obtained through a concordance analysis —which employed multi-facet logistic models (Many Facet Rasch Model MFRM)— of the scores of 40 judges, who estimated the relevance, adequacy, and clarity of each item. The metric properties of the scores were determined using ESEM —Exploratory Structural Equation Modeling—, EGA Exploratory Graph Analysis and Network Analysis. The scale was calibrated using Item Response Theory models: the Nominal Response Model and the Graded Response Model.
Landscape-scale conservation in the Congo Basin: lessons learned
nauru-data.sprep.org
tonga-data.sprep.org
+10more
pdf
Updated Feb 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
International Union for Nature Conservation (IUCN) (2022). Landscape-scale conservation in the Congo Basin: lessons learned [Dataset]. https://nauru-data.sprep.org/dataset/landscape-scale-conservation-congo-basin-lessons-learned
Explore at:
pdf(8945926)Available download formats
Dataset updated
Feb 15, 2022
Dataset provided by
International Union for Conservation of Naturehttp://iucn.org/
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
Congo Basin, Pacific Region
Description
To introduce this collection of studies, a logical first question to ask is why produce a “lessons learned” publication? The initial impetus for this initiative was an observation by an external evaluation of CARPE that there was relatively little sharing of information within the programme between numerous actors and sites concerning the conservation strategies undertaken and the results achieved (Weidemann Consortium, 2006).1 The evaluation concluded that this lack of information exchange was a threat to the success of CARPE as a large-scale regional programme, a view that was confirmed by programme partners during the CARPE Phase IIB Inception workshop that was held in Yaoundé in February 2007. 1 copy|also available online Call Number: 333.7 YAN [EL] ISBN/ISSN: 978-2-8317-1288-8 Physical Description: xvi, 262 p. ; 29 cm
d
Training data from SPCAM for machine learning in moist physics
datadryad.org
zenodo.org
zip
Updated Aug 7, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guang Zhang; Yilun Han; Xiaomeng Huang; Yong Wang (2020). Training data from SPCAM for machine learning in moist physics [Dataset]. http://doi.org/10.6075/J0CZ35PP
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6075/J0CZ35PP
Dataset updated
Aug 7, 2020
Dataset provided by
Dryad
Authors
Guang Zhang; Yilun Han; Xiaomeng Huang; Yong Wang
Time period covered
2020
Description
The training samples of the entire year (from yr-2 of simulation) are compressed in SPCAM_ML_Han_et_al_0.tar.gz, and testing samples of the entire year (from yr-3 of simulation) are compressed in SPCAM_ML_Han_et_al_1.tar.gz. In each dataset, there are a data documentation file and 365 netCDF data files (one file for each day) that are marked by its date. The variable fields contain temperature and moisture tendencies and cloud water and cloud ice from the CRM, and vertical profiles of temperature and moisture and large-scale temperature and moisture tendencies from the dynamic core of SPCAM’s host model CAM5 and PBL diffusion. In addition, we include surface sensible and latent heat fluxes. For more details, please read the data documentation inside the tar.gz files.
L
Large-Scale Model Training Machine Report
datainsightsmarket.com
doc, pdf, ppt
Updated Mar 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AMA Research & Media LLP (2025). Large-Scale Model Training Machine Report [Dataset]. https://www.datainsightsmarket.com/reports/large-scale-model-training-machine-41601
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 16, 2025
Dataset provided by
AMA Research & Media LLP
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Large-Scale Model Training Machine market is experiencing explosive growth, fueled by the increasing demand for advanced artificial intelligence (AI) applications across diverse sectors. The market, estimated at $15 billion in 2025, is projected to witness a robust Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching an estimated $75 billion by 2033. This surge is driven by several factors, including the proliferation of big data, advancements in deep learning algorithms, and the growing need for efficient model training in applications such as natural language processing (NLP), computer vision, and recommendation systems. Key market segments include the Internet, telecommunications, and government sectors, which are heavily investing in AI infrastructure to enhance their services and operational efficiency. The CPU+GPU segment dominates the market due to its superior performance in handling complex computations required for large-scale model training. Leading companies like Google, Amazon, Microsoft, and NVIDIA are at the forefront of innovation, constantly developing more powerful hardware and software solutions to address the evolving needs of this rapidly expanding market. The market's growth trajectory is shaped by several trends. The increasing adoption of cloud-based solutions for model training is significantly lowering the barrier to entry for smaller companies. Simultaneously, the development of specialized hardware like Tensor Processing Units (TPUs) and Field-Programmable Gate Arrays (FPGAs) is further optimizing performance and reducing costs. Despite this positive outlook, challenges remain. High infrastructure costs, the complexity of managing large datasets, and the shortage of skilled AI professionals are significant restraints on the market's expansion. However, ongoing technological advancements and increased investment in AI research are expected to mitigate these challenges, paving the way for sustained growth in the Large-Scale Model Training Machine market. Regional analysis indicates North America and Asia Pacific (particularly China) as the leading markets, with strong growth anticipated in other regions as AI adoption accelerates globally.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6624080

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave

Explore at:

Dataset updated

Aug 10, 2022

Dataset authored and provided by

Nirmalya Thakur

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Please cite the following paper when using this dataset:

N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Journal of Data, vol. 7, no. 8, p. 109, Aug. 2022, doi: 10.3390/data7080109

Abstract

The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description

The dataset comprises a total of 52,984 Tweet IDs (that correspond to the same number of Tweets) about online learning that were posted on Twitter from 9th November 2021 to 13th July 2022. The earliest date was selected as 9th November 2021, as the Omicron variant was detected for the first time in a sample that was collected on this date. 13th July 2022 was the most recent date as per the time of data collection and publication of this dataset.

The dataset consists of 9 .txt files. An overview of these dataset files along with the number of Tweet IDs and the date range of the associated tweets is as follows. Table 1 shows the list of all the synonyms or terms that were used for the dataset development.

Filename: TweetIDs_November_2021.txt (No. of Tweet IDs: 1283, Date Range of the associated Tweet IDs: November 1, 2021 to November 30, 2021)

Filename: TweetIDs_December_2021.txt (No. of Tweet IDs: 10545, Date Range of the associated Tweet IDs: December 1, 2021 to December 31, 2021)

Filename: TweetIDs_January_2022.txt (No. of Tweet IDs: 23078, Date Range of the associated Tweet IDs: January 1, 2022 to January 31, 2022)

Filename: TweetIDs_February_2022.txt (No. of Tweet IDs: 4751, Date Range of the associated Tweet IDs: February 1, 2022 to February 28, 2022)

Filename: TweetIDs_March_2022.txt (No. of Tweet IDs: 3434, Date Range of the associated Tweet IDs: March 1, 2022 to March 31, 2022)

Filename: TweetIDs_April_2022.txt (No. of Tweet IDs: 3355, Date Range of the associated Tweet IDs: April 1, 2022 to April 30, 2022)

Filename: TweetIDs_May_2022.txt (No. of Tweet IDs: 3120, Date Range of the associated Tweet IDs: May 1, 2022 to May 31, 2022)

Filename: TweetIDs_June_2022.txt (No. of Tweet IDs: 2361, Date Range of the associated Tweet IDs: June 1, 2022 to June 30, 2022)

Filename: TweetIDs_July_2022.txt (No. of Tweet IDs: 1057, Date Range of the associated Tweet IDs: July 1, 2022 to July 13, 2022)

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development

Terminology

List of synonyms and terms

COVID-19

Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus

online learning

online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures

Clear search

Close search

Google apps

Main menu

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...

Data from: Uncover-ML: a machine learning pipeline for geoscience data...

Data from: Occupations on the map: Using a super learner algorithm to...

Dataset used in Design Analytics for Mobile Learning: Scaling up...

Data from: MNIST Large Scale dataset Dataset

Data from: Industry-scale Application and Evaluation of Deep Learning for...

Dataset of factors impacting second language learning from teachers'...

Accuracy (%) of different algorithms for overall nutritional status.

Specifications of machine learning algorithms.

Informativeness, contingency and time scale invariance in associative...

Data from: Incorporating long-range physics in atomic-scale machine learning...

Data from: Impact of Interval Censoring on Data Accuracy and Machine...

Precision (%) of different algorithms for the underweight and...

Books called Stochastic optimization for large-scale machine learning

Data from: A Bayesian Nonlinear Mixed-Effects Location Scale Model for...

Data from: BuildingsBench: A Large-Scale Dataset of 900K Buildings and...

Q-Herilearn Scale data

Landscape-scale conservation in the Congo Basin: lessons learned

Training data from SPCAM for machine learning in moist physics

Large-Scale Model Training Machine Report

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron WaveSee More Versions

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave