100+ datasets found

i
A Large-Scale Dataset of 4G
ieee-dataport.org
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantinos Kousias (2022). A Large-Scale Dataset of 4G [Dataset]. https://ieee-dataport.org/documents/large-scale-dataset-4g-nb-iot-and-5g-non-standalone-network-measurements
Explore at:
Dataset updated
Nov 7, 2022
Authors
Konstantinos Kousias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
it is crucial to examine them from an empirical perspective.
a
Data from: MineRL: A Large-Scale Dataset of Minecraft Demonstrations
academictorrents.com
bittorrent
Updated Feb 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, Ruslan Salakhutdinov (2020). MineRL: A Large-Scale Dataset of Minecraft Demonstrations [Dataset]. https://academictorrents.com/details/b37b88b9cfaf0ed0c371da7d53c22c284c35c089
Explore at:
bittorrent(31820513429)Available download formats
Dataset updated
Feb 8, 2020
Dataset authored and provided by
William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, Ruslan Salakhutdinov
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The sample inefficiency of standard deep reinforcement learning methods precludes their application to many real-world problems. Methods which leverage human demonstrations require fewer samples but have been researched less. As demonstrated in the computer vision and natural language processing communities, large-scale datasets have the capacity to facilitate research by serving as an experimental and benchmarking platform for new methods. However, existing datasets compatible with reinforcement learning simulators do not have sufficient scale, structure, and quality to enable the further development and evaluation of methods focused on using human examples. Therefore, we introduce a comprehensive, large-scale, simulator-paired dataset of human demonstrations: MineRL. The dataset consists of over 60 million automatically annotated state-action pairs across a variety of related tasks in Minecraft, a dynamic, 3D, open-world environment. We present a novel data collection scheme which al
t
SMAL: A Large-Scale Dataset of 3D Animals
service.tib.eu
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). SMAL: A Large-Scale Dataset of 3D Animals [Dataset]. https://service.tib.eu/ldmservice/dataset/smal--a-large-scale-dataset-of-3d-animals
Explore at:
Dataset updated
Dec 3, 2024
Description
A dataset of 3D animal models used for training and testing 3D shape reconstruction models.
t
Data from: LRS3-TED: A Large-Scale Dataset for Visual Speech Recognition
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). LRS3-TED: A Large-Scale Dataset for Visual Speech Recognition [Dataset]. https://service.tib.eu/ldmservice/dataset/lrs3-ted--a-large-scale-dataset-for-visual-speech-recognition
Explore at:
Dataset updated
Dec 16, 2024
Description
LRS3-TED: a large-scale dataset for visual speech recognition.
c
Flowline - Large Scale
s.cnmilf.com
data.oregon.gov
+1more
Updated Jan 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Flowline - Large Scale [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/flowline-large-scale
Explore at:
Dataset updated
Jan 31, 2025
Dataset provided by
U.S. Geological Survey
Description
The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that make up the nation's surface water drainage system. NHD data was originally developed at 1:100,000 scale and exists at that scale for the whole country. High resolution NHD adds detail to the original 1:100,000-scale NHD. (Data for Alaska, Puerto Rico and the Virgin Islands was developed at high-resolution, not 1:100,000 scale.) Like the 1:100,000-scale NHD, high resolution NHD contains reach codes for networked features and isolated lakes, flow direction, names, stream level, and centerline representations for areal water bodies. Reaches are also defined to represent waterbodies and the approximate shorelines of the Great Lakes, the Atlantic and Pacific Oceans and the Gulf of Mexico. The NHD also incorporates the National Spatial Data Infrastructure framework criteria set out by the Federal Geographic Data Committee.
Z
Data from: INCLUDE: A Large Scale Dataset for Indian Sign Language...
data.niaid.nih.gov
live.european-language-grid.eu
Updated Dec 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sridhar, Advaith (2021). INCLUDE: A Large Scale Dataset for Indian Sign Language Recognition [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4010759
Explore at:
Dataset updated
Dec 19, 2021
Dataset provided by
IIT Madras, AI4Bharat
Authors
Sridhar, Advaith; Ganesan, Rohith Gandhi; Kumar, Pratyush; Khapra, Mitesh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
Dataset Details: The INCLUDE dataset has 4292 videos (the paper mentions 4287 videos but 5 videos were added later). The videos used for training are mentioned in train.csv (3475), while that used for testing is mentioned in test.csv (817 files). Each video is a recording of 1 ISL sign, signed by deaf students from St. Louis School for the Deaf, Adyar, Chennai.

INCLUDE50 has 766 train videos and 192 test videos.

Train-Test Split: Please download the train-test split for INCLUDE and INCLUDE50 from here: Train-Test Split

Publication Link: https://dl.acm.org/doi/10.1145/3394171.3413528

AI4Bharat website: https://sign-language.ai4bharat.org/

Download Instructions

For ease of access, we have prepared a Shell Script to download all the parts of the dataset and extract them to form the complete INCLUDE dataset.

You can find the script here: http://bit.ly/include_dl

Paper Abstract: Indian Sign Language (ISL) is a complete language with its own grammar, syntax, vocabulary and several unique linguistic attributes. It is used by over 5 million deaf people in India. Currently, there is no publicly available dataset on ISL to evaluate Sign Language Recognition (SLR) approaches. In this work, we present the Indian Lexicon Sign Language Dataset - INCLUDE - an ISL dataset that contains 0.27 million frames across 4,287 videos over 263 word signs from 15 different word categories. INCLUDE is recorded with the help of experienced signers to provide close resemblance to natural conditions. A subset of 50 word signs is chosen across word categories to define INCLUDE-50 for rapid evaluation of SLR methods with hyperparameter tuning. The best performing model achieves an accuracy of 94.5% on the INCLUDE-50 dataset and 85.6% on the INCLUDE dataset
t
Data from: RgbD1K: A large-scale dataset and benchmark for rgb-d object...
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). RgbD1K: A large-scale dataset and benchmark for rgb-d object tracking [Dataset]. https://service.tib.eu/ldmservice/dataset/rgbd1k--a-large-scale-dataset-and-benchmark-for-rgb-d-object-tracking
Explore at:
Dataset updated
Dec 16, 2024
Description
RgbD1K: A large-scale dataset and benchmark for rgb-d object tracking
Data from: UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess...
zenodo.org
application/gzip
Updated Mar 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2024). UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing [Dataset]. http://doi.org/10.5281/zenodo.10850974
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10850974
Dataset updated
Mar 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 15, 2024
Description
This tar.gz file includes dataset for UniTSyn
d
New Visions for Large Scale Networks: Research and Applications
catalog.data.gov
datasets.ai
+3more
Updated May 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCO NITRD (2025). New Visions for Large Scale Networks: Research and Applications [Dataset]. https://catalog.data.gov/dataset/new-visions-for-large-scale-networks-research-and-applications
Explore at:
Dataset updated
May 14, 2025
Dataset provided by
NCO NITRD
Description
This paper documents the findings of the March 12-14, 2001 Workshop on New Visions for Large-Scale Networks: Research and Applications. The workshops objectives were to develop a vision for the future of networking 10 to 20 years out and to identify needed Federal networking research to enable that vision...
i
000 Tweets
ieee-dataport.org
Updated Jul 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). 000 Tweets [Dataset]. https://ieee-dataport.org/documents/twitter-conversations-about-covid-19-omicron-variant-large-scale-dataset-more-500000
Explore at:
Dataset updated
Jul 25, 2022
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
2022
h
large-scale-hate-speech-v2
huggingface.co
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cagri Toraman (2023). large-scale-hate-speech-v2 [Dataset]. https://huggingface.co/datasets/ctoraman/large-scale-hate-speech-v2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 30, 2023
Authors
Cagri Toraman
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The dataset published in the LREC 2022 paper "Large-Scale Hate Speech Detection with Cross-Domain Transfer".

This is Dataset v2:

The modified dataset that includes 68,597 tweets in English. The annotations with more than 80% agreement are included. TweetID: Tweet ID from Twitter API LangID: 1 (English) TopicID: Domain of the topic 0-Religion, 1-Gender, 2-Race, 3-Politics, 4-Sports HateLabel: Final hate label decision 0-Normal, 1-Offensive, 2-Hate

GitHub Repo:

NOTE:… See the full description on the dataset page: https://huggingface.co/datasets/ctoraman/large-scale-hate-speech-v2.
Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)
redivis.com
application/jsonl +7
Updated Jun 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Doerr School of Sustainability Data Repository (2024). Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [Dataset]. http://doi.org/10.57761/gk3g-wc33
Explore at:
stata, csv, application/jsonl, arrow, parquet, sas, spss, avroAvailable download formats
Unique identifier
https://doi.org/10.57761/gk3g-wc33
Dataset updated
Jun 28, 2024
Dataset provided by
Redivis Inc.
Authors
Stanford Doerr School of Sustainability Data Repository
Time period covered
Jun 27, 2024
Description
Abstract

S3DIS comprises 6 colored 3D point clouds from 6 large-scale indoor areas, along with semantic instance annotations for 12 object categories (wall, floor, ceiling, beam, column, window, door, sofa, desk, chair, bookcase, and board).

Methodology

The Stanford Large-Scale 3D Indoor Spaces (S3DIS) dataset is composed of the colored 3D point clouds of six large-scale indoor areas from three different buildings, each covering approximately 935, 965, 450, 1700, 870, and 1100 square meters (total of 6020 square meters). These areas show diverse properties in architectural style and appearance and include mainly office areas, educational and exhibition spaces, and conference rooms, personal offices, restrooms, open spaces, lobbies, stairways, and hallways are commonly found therein. The entire point clouds are automatically generated without any manual intervention using the Matterport scanner. The dataset also includes semantic instance annotations on the point clouds for 12 semantic elements, which are structural elements (ceiling, floor, wall, beam, column, window, and door) and commonly found items and furniture (table, chair, sofa, bookcase, and board).

https://redivis.com/fileUploads/5bdaf09c-7d3b-4a91-b192-d98a0f0b0018%3E" alt="S3DIS.png">

%3Cu%3E%3Cstrong%3EImportant Information%3C/strong%3E%3C/u%3E

This paper was presented in the "3D Semantic Parsing of Large-Scale Indoor Spaces", CVPR 2016.

Project website: http://buildingparser.stanford.edu/

%3C!-- --%3E
t
COIN: A large-scale dataset for comprehensive instructional video analysis -...
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). COIN: A large-scale dataset for comprehensive instructional video analysis - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/coin--a-large-scale-dataset-for-comprehensive-instructional-video-analysis
Explore at:
Dataset updated
Dec 2, 2024
Description
COIN dataset for comprehensive instructional video analysis
a
Data from: AVA: A Large-Scale Database for Aesthetic Visual Analysis
academictorrents.com
bittorrent
Updated Jul 16, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naila Murray and Luca Marchesotti and Florent Perronnin (2017). AVA: A Large-Scale Database for Aesthetic Visual Analysis [Dataset]. https://academictorrents.com/details/71631f83b11d3d79d8f84efe0a7e12f0ac001460
Explore at:
bittorrent(33142609854)Available download formats
Dataset updated
Jul 16, 2017
Dataset authored and provided by
Naila Murray and Luca Marchesotti and Florent Perronnin
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Aesthetic Visual Analysis (AVA) contains over 250,000 images along with a rich variety of meta-data including a large number of aesthetic scores for each image, semantic labels for over 60 categories as well as labels related to photographic style for high-level image quality categorization.
H
Data from: A Large-Scale Dataset of Twitter Chatter About Online Learning...
dataverse.harvard.edu
Updated Aug 9, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter About Online Learning During The Current COVID-19 Omicron Wave [Dataset]. http://doi.org/10.7910/DVN/GBHOD9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/GBHOD9
Dataset updated
Aug 9, 2022
Dataset provided by
Harvard Dataverse
Authors
Nirmalya Thakur
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset: N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Journal of Data, vol. 7, no. 8, p. 109, Aug. 2022, doi: 10.3390/data7080109 Abstract The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset files contain the raw version that comprises 52,868 Tweet IDs (that correspond to the same number of Tweets) and the cleaned and preprocessed version that contains 46,208 unique Tweet IDs. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. Data Description The dataset comprises 7 .txt files. The raw version of this dataset comprises 6 .txt files (TweetIDs_Corona Virus.txt, TweetIDs_Corona.txt, TweetIDs_Coronavirus.txt, TweetIDs_Covid.txt, TweetIDs_Omicron.txt, and TweetIDs_SARS CoV2.txt) that contain Tweet IDs grouped together based on certain synonyms or terms that were used to refer to online learning and the Omicron variant of COVID-19 in the respective tweets. The cleaned and preprocessed version of this dataset is provided in the .txt file - TweetIDs_Duplicates_Removed.txt. The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweetsr) may be used. The list of all the synonyms or terms that were used for the dataset development is as follows: COVID-19: Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus online learning: online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures A description of the dataset files is provided below: TweetIDs_Corona Virus.txt – Contains 321 Tweet IDs correspond to tweets that comprise the keywords – "corona virus" and one or more keywords/terms that refer to online learning TweetIDs_Corona.txt – Contains 1819 Tweet IDs correspond to tweets that comprise the keyword – "corona" or "coronaoutbreak" and one or more keywords/terms that refer to online learning TweetIDs_Coronavirus.txt – Contains 1429 Tweet IDs correspond to tweets that comprise the keywords – "coronavirus" or "coronaviruspandemic" and one or more keywords/terms that refer to online learning TweetIDs_Covid.txt – Contains 41088 Tweet IDs correspond to tweets that comprise the keywords – "COVID" or "COVID19" or "COVID-19" and one or more keywords/terms that refer to online learning TweetIDs_Omicron.txt – Contains 8198 Tweet IDs correspond to tweets that comprise the keywords – "omicron" or "omicron variant" and one or more keywords/terms that refer to online learning TweetIDs_SARS CoV2.txt – Contains 13 Tweet IDs correspond to tweets that comprise the keyword – "SARS-CoV-2" and one or more keywords/terms that refer to online learning TweetIDs_Duplicates_Removed.txt - A collection of 46208 unique Tweet IDs from all the 6 .txt files mentioned above after...
VLA XMM Large Scale Structure Field 325-MHz Source Catalog - Dataset - NASA...
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). VLA XMM Large Scale Structure Field 325-MHz Source Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/vla-xmm-large-scale-structure-field-325-mhz-source-catalog
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The XMM Large Scale Structure survey (XMM-LSS) is an X-ray survey aimed at studying the large scale structure of the Universe. The XMM-LSS field (centered at RA (J2000) = 02^h 24^m 00.27^s, Dec (J2000) = -04^o 09' 47.6") is currently being followed up using observations across a wide range of wavelengths, and in their paper the authors present the observational results of a low frequency radio survey of the XMM-LSS field using the Very Large Array at 74 and 325 MHz. This survey will map out the locations of the extragalactic radio sources relative to the large scale structure as traced by the X-ray emission. This is of particular interest because radio galaxies and radio-loud AGN show strong and complex interactions with their small and larger scale environment, and different classes of radio galaxies are suggested to lie at different places with respect to the large scale structure. For the phase calibration of the radio data, the authors used standard self-calibration at 325 MHz and field-base calibration at 74 MHz. Polyhedron-based imaging as well as mosaicking methods were used at both frequencies. At 74 MHz, the resolution was 30 arcseconds, the median 5-sigma sensitivity was ~ 162 mJy/beam and 666 sources were detected over an area of 132 square degrees. At 325 MHz, the resolution was 6.7 arcseconds, the median 5-sigma sensitivity was 4 mJy/beam, and 847 sources were detected over an area of 15.3 square degrees. At 325 MHz, a region of diffuse radio emission which is a cluster halo or relic candidate was detected. The observations were conducted using the VLA in July 2003 in the A-configuration (most extended) and in June 2002 in the B-configuration. This table contains the VLA 325-MHz source list, comprising 605 single sources and 615 components of 237 multiple sources, for a total of 1220 entries. (Notice that, in Section 4.1 of the reference paper, somewhat different numbers are given, i.e., the authors quote 621 single sources and 226 multiple sources). For the multiple sources, each component (A, B, etc.) is listed separately, in order of decreasing brightness. This table was created by the HEASARC in March 2012 based on CDS Catalog J/A+A/456/791 file tablea1.dat. This is a service provided by NASA HEASARC .
d
Developing Large-Scale Bayesian Networks by Composition
catalog.data.gov
s.cnmilf.com
+2more
Updated Sep 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Developing Large-Scale Bayesian Networks by Composition [Dataset]. https://catalog.data.gov/dataset/developing-large-scale-bayesian-networks-by-composition
Explore at:
Dataset updated
Sep 4, 2025
Dataset provided by
Dashlink
Description
In this paper, we investigate the use of Bayesian networks to construct large-scale diagnostic systems. In particular, we consider the development of large-scale Bayesian networks by composition. This compositional approach reflects how (often redundant) subsystems are architected to form systems such as electrical power systems. We develop high-level specifications, Bayesian networks, clique trees, and arithmetic circuits representing 24 different electrical power systems. The largest among these 24 Bayesian networks contains over 1,000 random variables. Another BN represents the real-world electrical power system ADAPT, which is representative of electrical power systems deployed in aerospace vehicles. In addition to demonstrating the scalability of the compositional approach, we briefly report on experimental results from the diagnostic competition DXC, where the ProADAPT team, using techniques discussed here, obtained the highest scores in both Tier 1 (among 9 international competitors) and Tier 2 (among 6 international competitors) of the industrial track. While we consider diagnosis of power systems specically, we believe this work is relevant to other system health management problems, in particular in dependable systems such as aircraft and spacecraft. Reference: O. J. Mengshoel, S. Poll, and T. Kurtoglu. "Developing Large-Scale Bayesian Networks by Composition: Fault Diagnosis of Electrical Power Systems in Aircraft and Spacecraft." Proc. of the IJCAI-09 Workshop on Self-* and Autonomous Systems (SAS): Reasoning and Integration Challenges, 2009 BibTex Reference: @inproceedings{mengshoel09developing, title = {Developing Large-Scale {Bayesian} Networks by Composition: Fault Diagnosis of Electrical Power Systems in Aircraft and Spacecraft}, author = {Mengshoel, O. J. and Poll, S. and Kurtoglu, T.}, booktitle = {Proc. of the IJCAI-09 Workshop on Self-$\star$ and Autonomous Systems (SAS): Reasoning and Integration Challenges}, year={2009} }
Making Predictions using Large Scale Gaussian Processes - Dataset - NASA...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Making Predictions using Large Scale Gaussian Processes - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/making-predictions-using-large-scale-gaussian-processes
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples tex [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair tex [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his recent paper (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our new paper on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.
p
Data from: CheXmask Database: a large-scale dataset of anatomical...
physionet.org
Updated Jan 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Gaggion; Candelaria Mosquera; Martina Aineseder; Lucas Mansilla; Diego Milone; Enzo Ferrante (2025). CheXmask Database: a large-scale dataset of anatomical segmentation masks for chest x-ray images [Dataset]. http://doi.org/10.13026/3705-zg36
Explore at:
Unique identifier
https://doi.org/10.13026/3705-zg36
Dataset updated
Jan 22, 2025
Authors
Nicolas Gaggion; Candelaria Mosquera; Martina Aineseder; Lucas Mansilla; Diego Milone; Enzo Ferrante
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The CheXmask Database presents a comprehensive, uniformly annotated collection of chest radiographs, constructed from five public databases: ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest and VinDr-CXR. The database aggregates 657,566 anatomical segmentation masks derived from images which have been processed using the HybridGNet model to ensure consistent, high-quality segmentation. To confirm the quality of the segmentations, we include in this database individual Reverse Classification Accuracy (RCA) scores for each of the segmentation masks. This dataset is intended to catalyze further innovation and refinement in the field of semantic chest X-ray analysis, offering a significant resource for researchers in the medical imaging domain.
program-cota-llava
huggingface.co
Updated Jul 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salesforce (2025). program-cota-llava [Dataset]. https://huggingface.co/datasets/Salesforce/program-cota-llava
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2025
Dataset provided by
Salesforce Inchttp://salesforce.com/
Authors
Salesforce
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
🌮 TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

🌐 Website | 📑 Arxiv | 💻 Code| 🤗 Datasets

If you like our project or are interested in its updates, please star us :) Thank you! ⭐

Summary

TLDR: CoTA is a large-scale dataset of synthetic Chains-of-Thought-and-Action (CoTA) generated by programs.

Load data

from datasets import load_dataset dataset = load_dataset("Salesforce/program-cota-llava"… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/program-cota-llava.

Facebook

Twitter

Click to copy link

Link copied

Cite

Konstantinos Kousias (2022). A Large-Scale Dataset of 4G [Dataset]. https://ieee-dataport.org/documents/large-scale-dataset-4g-nb-iot-and-5g-non-standalone-network-measurements

A Large-Scale Dataset of 4G

NB-IoT

and 5G Non-Standalone Network Measurements

Explore at:

20 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Nov 7, 2022

Authors

Konstantinos Kousias

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

it is crucial to examine them from an empirical perspective.

Clear search

Close search

Google apps

Main menu

A Large-Scale Dataset of 4G

Data from: MineRL: A Large-Scale Dataset of Minecraft Demonstrations

SMAL: A Large-Scale Dataset of 3D Animals

Data from: LRS3-TED: A Large-Scale Dataset for Visual Speech Recognition

Flowline - Large Scale

Data from: INCLUDE: A Large Scale Dataset for Indian Sign Language...

Data from: RgbD1K: A large-scale dataset and benchmark for rgb-d object...

Data from: UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess...

New Visions for Large Scale Networks: Research and Applications

000 Tweets

large-scale-hate-speech-v2

Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)

Abstract

Methodology

COIN: A large-scale dataset for comprehensive instructional video analysis -...

Data from: AVA: A Large-Scale Database for Aesthetic Visual Analysis

Data from: A Large-Scale Dataset of Twitter Chatter About Online Learning...

VLA XMM Large Scale Structure Field 325-MHz Source Catalog - Dataset - NASA...

Developing Large-Scale Bayesian Networks by Composition

Making Predictions using Large Scale Gaussian Processes - Dataset - NASA...

Data from: CheXmask Database: a large-scale dataset of anatomical...

program-cota-llava

A Large-Scale Dataset of 4G

NB-IoT

and 5G Non-Standalone Network Measurements