100+ datasets found
  1. i

    A Large-Scale Dataset of 4G

    • ieee-dataport.org
    Updated Nov 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantinos Kousias (2022). A Large-Scale Dataset of 4G [Dataset]. https://ieee-dataport.org/documents/large-scale-dataset-4g-nb-iot-and-5g-non-standalone-network-measurements
    Explore at:
    Dataset updated
    Nov 7, 2022
    Authors
    Konstantinos Kousias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    it is crucial to examine them from an empirical perspective.

  2. a

    Data from: MineRL: A Large-Scale Dataset of Minecraft Demonstrations

    • academictorrents.com
    bittorrent
    Updated Feb 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, Ruslan Salakhutdinov (2020). MineRL: A Large-Scale Dataset of Minecraft Demonstrations [Dataset]. https://academictorrents.com/details/b37b88b9cfaf0ed0c371da7d53c22c284c35c089
    Explore at:
    bittorrent(31820513429)Available download formats
    Dataset updated
    Feb 8, 2020
    Dataset authored and provided by
    William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, Ruslan Salakhutdinov
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The sample inefficiency of standard deep reinforcement learning methods precludes their application to many real-world problems. Methods which leverage human demonstrations require fewer samples but have been researched less. As demonstrated in the computer vision and natural language processing communities, large-scale datasets have the capacity to facilitate research by serving as an experimental and benchmarking platform for new methods. However, existing datasets compatible with reinforcement learning simulators do not have sufficient scale, structure, and quality to enable the further development and evaluation of methods focused on using human examples. Therefore, we introduce a comprehensive, large-scale, simulator-paired dataset of human demonstrations: MineRL. The dataset consists of over 60 million automatically annotated state-action pairs across a variety of related tasks in Minecraft, a dynamic, 3D, open-world environment. We present a novel data collection scheme which al

  3. t

    SMAL: A Large-Scale Dataset of 3D Animals

    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). SMAL: A Large-Scale Dataset of 3D Animals [Dataset]. https://service.tib.eu/ldmservice/dataset/smal--a-large-scale-dataset-of-3d-animals
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    A dataset of 3D animal models used for training and testing 3D shape reconstruction models.

  4. t

    Data from: LRS3-TED: A Large-Scale Dataset for Visual Speech Recognition

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). LRS3-TED: A Large-Scale Dataset for Visual Speech Recognition [Dataset]. https://service.tib.eu/ldmservice/dataset/lrs3-ted--a-large-scale-dataset-for-visual-speech-recognition
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    LRS3-TED: a large-scale dataset for visual speech recognition.

  5. c

    Flowline - Large Scale

    • s.cnmilf.com
    • data.oregon.gov
    • +1more
    Updated Jan 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Flowline - Large Scale [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/flowline-large-scale
    Explore at:
    Dataset updated
    Jan 31, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that make up the nation's surface water drainage system. NHD data was originally developed at 1:100,000 scale and exists at that scale for the whole country. High resolution NHD adds detail to the original 1:100,000-scale NHD. (Data for Alaska, Puerto Rico and the Virgin Islands was developed at high-resolution, not 1:100,000 scale.) Like the 1:100,000-scale NHD, high resolution NHD contains reach codes for networked features and isolated lakes, flow direction, names, stream level, and centerline representations for areal water bodies. Reaches are also defined to represent waterbodies and the approximate shorelines of the Great Lakes, the Atlantic and Pacific Oceans and the Gulf of Mexico. The NHD also incorporates the National Spatial Data Infrastructure framework criteria set out by the Federal Geographic Data Committee.

  6. Z

    Data from: INCLUDE: A Large Scale Dataset for Indian Sign Language...

    • data.niaid.nih.gov
    • live.european-language-grid.eu
    Updated Dec 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sridhar, Advaith (2021). INCLUDE: A Large Scale Dataset for Indian Sign Language Recognition [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4010759
    Explore at:
    Dataset updated
    Dec 19, 2021
    Dataset provided by
    IIT Madras, AI4Bharat
    Authors
    Sridhar, Advaith; Ganesan, Rohith Gandhi; Kumar, Pratyush; Khapra, Mitesh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Dataset Details: The INCLUDE dataset has 4292 videos (the paper mentions 4287 videos but 5 videos were added later). The videos used for training are mentioned in train.csv (3475), while that used for testing is mentioned in test.csv (817 files). Each video is a recording of 1 ISL sign, signed by deaf students from St. Louis School for the Deaf, Adyar, Chennai.

    INCLUDE50 has 766 train videos and 192 test videos.

    Train-Test Split: Please download the train-test split for INCLUDE and INCLUDE50 from here: Train-Test Split

    Publication Link: https://dl.acm.org/doi/10.1145/3394171.3413528

    AI4Bharat website: https://sign-language.ai4bharat.org/

    Download Instructions

    For ease of access, we have prepared a Shell Script to download all the parts of the dataset and extract them to form the complete INCLUDE dataset.

    You can find the script here: http://bit.ly/include_dl

    Paper Abstract: Indian Sign Language (ISL) is a complete language with its own grammar, syntax, vocabulary and several unique linguistic attributes. It is used by over 5 million deaf people in India. Currently, there is no publicly available dataset on ISL to evaluate Sign Language Recognition (SLR) approaches. In this work, we present the Indian Lexicon Sign Language Dataset - INCLUDE - an ISL dataset that contains 0.27 million frames across 4,287 videos over 263 word signs from 15 different word categories. INCLUDE is recorded with the help of experienced signers to provide close resemblance to natural conditions. A subset of 50 word signs is chosen across word categories to define INCLUDE-50 for rapid evaluation of SLR methods with hyperparameter tuning. The best performing model achieves an accuracy of 94.5% on the INCLUDE-50 dataset and 85.6% on the INCLUDE dataset

  7. t

    Data from: RgbD1K: A large-scale dataset and benchmark for rgb-d object...

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). RgbD1K: A large-scale dataset and benchmark for rgb-d object tracking [Dataset]. https://service.tib.eu/ldmservice/dataset/rgbd1k--a-large-scale-dataset-and-benchmark-for-rgb-d-object-tracking
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    RgbD1K: A large-scale dataset and benchmark for rgb-d object tracking

  8. Data from: UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess...

    • zenodo.org
    application/gzip
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2024). UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing [Dataset]. http://doi.org/10.5281/zenodo.10850974
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 15, 2024
    Description

    This tar.gz file includes dataset for UniTSyn

  9. d

    New Visions for Large Scale Networks: Research and Applications

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated May 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCO NITRD (2025). New Visions for Large Scale Networks: Research and Applications [Dataset]. https://catalog.data.gov/dataset/new-visions-for-large-scale-networks-research-and-applications
    Explore at:
    Dataset updated
    May 14, 2025
    Dataset provided by
    NCO NITRD
    Description

    This paper documents the findings of the March 12-14, 2001 Workshop on New Visions for Large-Scale Networks: Research and Applications. The workshops objectives were to develop a vision for the future of networking 10 to 20 years out and to identify needed Federal networking research to enable that vision...

  10. i

    000 Tweets

    • ieee-dataport.org
    Updated Jul 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). 000 Tweets [Dataset]. https://ieee-dataport.org/documents/twitter-conversations-about-covid-19-omicron-variant-large-scale-dataset-more-500000
    Explore at:
    Dataset updated
    Jul 25, 2022
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    2022

  11. h

    large-scale-hate-speech-v2

    • huggingface.co
    Updated Nov 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cagri Toraman (2023). large-scale-hate-speech-v2 [Dataset]. https://huggingface.co/datasets/ctoraman/large-scale-hate-speech-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2023
    Authors
    Cagri Toraman
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The dataset published in the LREC 2022 paper "Large-Scale Hate Speech Detection with Cross-Domain Transfer".

      This is Dataset v2:
    

    The modified dataset that includes 68,597 tweets in English. The annotations with more than 80% agreement are included. TweetID: Tweet ID from Twitter API LangID: 1 (English) TopicID: Domain of the topic 0-Religion, 1-Gender, 2-Race, 3-Politics, 4-Sports HateLabel: Final hate label decision 0-Normal, 1-Offensive, 2-Hate

      GitHub Repo:
    

    NOTE:โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/ctoraman/large-scale-hate-speech-v2.

  12. Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)

    • redivis.com
    application/jsonl +7
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Doerr School of Sustainability Data Repository (2024). Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [Dataset]. http://doi.org/10.57761/gk3g-wc33
    Explore at:
    stata, csv, application/jsonl, arrow, parquet, sas, spss, avroAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Doerr School of Sustainability Data Repository
    Time period covered
    Jun 27, 2024
    Description

    Abstract

    S3DIS comprises 6 colored 3D point clouds from 6 large-scale indoor areas, along with semantic instance annotations for 12 object categories (wall, floor, ceiling, beam, column, window, door, sofa, desk, chair, bookcase, and board).

    Methodology

    The Stanford Large-Scale 3D Indoor Spaces (S3DIS) dataset is composed of the colored 3D point clouds of six large-scale indoor areas from three different buildings, each covering approximately 935, 965, 450, 1700, 870, and 1100 square meters (total of 6020 square meters). These areas show diverse properties in architectural style and appearance and include mainly office areas, educational and exhibition spaces, and conference rooms, personal offices, restrooms, open spaces, lobbies, stairways, and hallways are commonly found therein. The entire point clouds are automatically generated without any manual intervention using the Matterport scanner. The dataset also includes semantic instance annotations on the point clouds for 12 semantic elements, which are structural elements (ceiling, floor, wall, beam, column, window, and door) and commonly found items and furniture (table, chair, sofa, bookcase, and board).

    https://redivis.com/fileUploads/5bdaf09c-7d3b-4a91-b192-d98a0f0b0018%3E" alt="S3DIS.png">

    %3Cu%3E%3Cstrong%3EImportant Information%3C/strong%3E%3C/u%3E

    %3C!-- --%3E

  13. t

    COIN: A large-scale dataset for comprehensive instructional video analysis -...

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). COIN: A large-scale dataset for comprehensive instructional video analysis - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/coin--a-large-scale-dataset-for-comprehensive-instructional-video-analysis
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    COIN dataset for comprehensive instructional video analysis

  14. a

    Data from: AVA: A Large-Scale Database for Aesthetic Visual Analysis

    • academictorrents.com
    bittorrent
    Updated Jul 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naila Murray and Luca Marchesotti and Florent Perronnin (2017). AVA: A Large-Scale Database for Aesthetic Visual Analysis [Dataset]. https://academictorrents.com/details/71631f83b11d3d79d8f84efe0a7e12f0ac001460
    Explore at:
    bittorrent(33142609854)Available download formats
    Dataset updated
    Jul 16, 2017
    Dataset authored and provided by
    Naila Murray and Luca Marchesotti and Florent Perronnin
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Aesthetic Visual Analysis (AVA) contains over 250,000 images along with a rich variety of meta-data including a large number of aesthetic scores for each image, semantic labels for over 60 categories as well as labels related to photographic style for high-level image quality categorization.

  15. H

    Data from: A Large-Scale Dataset of Twitter Chatter About Online Learning...

    • dataverse.harvard.edu
    Updated Aug 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter About Online Learning During The Current COVID-19 Omicron Wave [Dataset]. http://doi.org/10.7910/DVN/GBHOD9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Nirmalya Thakur
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset: N. Thakur, โ€œA Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,โ€ Journal of Data, vol. 7, no. 8, p. 109, Aug. 2022, doi: 10.3390/data7080109 Abstract The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset files contain the raw version that comprises 52,868 Tweet IDs (that correspond to the same number of Tweets) and the cleaned and preprocessed version that contains 46,208 unique Tweet IDs. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. Data Description The dataset comprises 7 .txt files. The raw version of this dataset comprises 6 .txt files (TweetIDs_Corona Virus.txt, TweetIDs_Corona.txt, TweetIDs_Coronavirus.txt, TweetIDs_Covid.txt, TweetIDs_Omicron.txt, and TweetIDs_SARS CoV2.txt) that contain Tweet IDs grouped together based on certain synonyms or terms that were used to refer to online learning and the Omicron variant of COVID-19 in the respective tweets. The cleaned and preprocessed version of this dataset is provided in the .txt file - TweetIDs_Duplicates_Removed.txt. The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweetsr) may be used. The list of all the synonyms or terms that were used for the dataset development is as follows: COVID-19: Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus online learning: online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures A description of the dataset files is provided below: TweetIDs_Corona Virus.txt โ€“ Contains 321 Tweet IDs correspond to tweets that comprise the keywords โ€“ "corona virus" and one or more keywords/terms that refer to online learning TweetIDs_Corona.txt โ€“ Contains 1819 Tweet IDs correspond to tweets that comprise the keyword โ€“ "corona" or "coronaoutbreak" and one or more keywords/terms that refer to online learning TweetIDs_Coronavirus.txt โ€“ Contains 1429 Tweet IDs correspond to tweets that comprise the keywords โ€“ "coronavirus" or "coronaviruspandemic" and one or more keywords/terms that refer to online learning TweetIDs_Covid.txt โ€“ Contains 41088 Tweet IDs correspond to tweets that comprise the keywords โ€“ "COVID" or "COVID19" or "COVID-19" and one or more keywords/terms that refer to online learning TweetIDs_Omicron.txt โ€“ Contains 8198 Tweet IDs correspond to tweets that comprise the keywords โ€“ "omicron" or "omicron variant" and one or more keywords/terms that refer to online learning TweetIDs_SARS CoV2.txt โ€“ Contains 13 Tweet IDs correspond to tweets that comprise the keyword โ€“ "SARS-CoV-2" and one or more keywords/terms that refer to online learning TweetIDs_Duplicates_Removed.txt - A collection of 46208 unique Tweet IDs from all the 6 .txt files mentioned above after...

  16. VLA XMM Large Scale Structure Field 325-MHz Source Catalog - Dataset - NASA...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). VLA XMM Large Scale Structure Field 325-MHz Source Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/vla-xmm-large-scale-structure-field-325-mhz-source-catalog
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The XMM Large Scale Structure survey (XMM-LSS) is an X-ray survey aimed at studying the large scale structure of the Universe. The XMM-LSS field (centered at RA (J2000) = 02h 24m 00.27s, Dec (J2000) = -04o 09' 47.6") is currently being followed up using observations across a wide range of wavelengths, and in their paper the authors present the observational results of a low frequency radio survey of the XMM-LSS field using the Very Large Array at 74 and 325 MHz. This survey will map out the locations of the extragalactic radio sources relative to the large scale structure as traced by the X-ray emission. This is of particular interest because radio galaxies and radio-loud AGN show strong and complex interactions with their small and larger scale environment, and different classes of radio galaxies are suggested to lie at different places with respect to the large scale structure. For the phase calibration of the radio data, the authors used standard self-calibration at 325 MHz and field-base calibration at 74 MHz. Polyhedron-based imaging as well as mosaicking methods were used at both frequencies. At 74 MHz, the resolution was 30 arcseconds, the median 5-sigma sensitivity was ~ 162 mJy/beam and 666 sources were detected over an area of 132 square degrees. At 325 MHz, the resolution was 6.7 arcseconds, the median 5-sigma sensitivity was 4 mJy/beam, and 847 sources were detected over an area of 15.3 square degrees. At 325 MHz, a region of diffuse radio emission which is a cluster halo or relic candidate was detected. The observations were conducted using the VLA in July 2003 in the A-configuration (most extended) and in June 2002 in the B-configuration. This table contains the VLA 325-MHz source list, comprising 605 single sources and 615 components of 237 multiple sources, for a total of 1220 entries. (Notice that, in Section 4.1 of the reference paper, somewhat different numbers are given, i.e., the authors quote 621 single sources and 226 multiple sources). For the multiple sources, each component (A, B, etc.) is listed separately, in order of decreasing brightness. This table was created by the HEASARC in March 2012 based on CDS Catalog J/A+A/456/791 file tablea1.dat. This is a service provided by NASA HEASARC .

  17. d

    Developing Large-Scale Bayesian Networks by Composition

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Sep 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Developing Large-Scale Bayesian Networks by Composition [Dataset]. https://catalog.data.gov/dataset/developing-large-scale-bayesian-networks-by-composition
    Explore at:
    Dataset updated
    Sep 4, 2025
    Dataset provided by
    Dashlink
    Description

    In this paper, we investigate the use of Bayesian networks to construct large-scale diagnostic systems. In particular, we consider the development of large-scale Bayesian networks by composition. This compositional approach reflects how (often redundant) subsystems are architected to form systems such as electrical power systems. We develop high-level specifications, Bayesian networks, clique trees, and arithmetic circuits representing 24 different electrical power systems. The largest among these 24 Bayesian networks contains over 1,000 random variables. Another BN represents the real-world electrical power system ADAPT, which is representative of electrical power systems deployed in aerospace vehicles. In addition to demonstrating the scalability of the compositional approach, we briefly report on experimental results from the diagnostic competition DXC, where the ProADAPT team, using techniques discussed here, obtained the highest scores in both Tier 1 (among 9 international competitors) and Tier 2 (among 6 international competitors) of the industrial track. While we consider diagnosis of power systems specically, we believe this work is relevant to other system health management problems, in particular in dependable systems such as aircraft and spacecraft. Reference: O. J. Mengshoel, S. Poll, and T. Kurtoglu. "Developing Large-Scale Bayesian Networks by Composition: Fault Diagnosis of Electrical Power Systems in Aircraft and Spacecraft." Proc. of the IJCAI-09 Workshop on Self-* and Autonomous Systems (SAS): Reasoning and Integration Challenges, 2009 BibTex Reference: @inproceedings{mengshoel09developing, title = {Developing Large-Scale {Bayesian} Networks by Composition: Fault Diagnosis of Electrical Power Systems in Aircraft and Spacecraft}, author = {Mengshoel, O. J. and Poll, S. and Kurtoglu, T.}, booktitle = {Proc. of the IJCAI-09 Workshop on Self-$\star$ and Autonomous Systems (SAS): Reasoning and Integration Challenges}, year={2009} }

  18. Making Predictions using Large Scale Gaussian Processes - Dataset - NASA...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Making Predictions using Large Scale Gaussian Processes - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/making-predictions-using-large-scale-gaussian-processes
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples tex [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair tex [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, Iโ€™d suggest you take a look at his recent paper (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our new paper on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.

  19. p

    Data from: CheXmask Database: a large-scale dataset of anatomical...

    • physionet.org
    Updated Jan 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Gaggion; Candelaria Mosquera; Martina Aineseder; Lucas Mansilla; Diego Milone; Enzo Ferrante (2025). CheXmask Database: a large-scale dataset of anatomical segmentation masks for chest x-ray images [Dataset]. http://doi.org/10.13026/3705-zg36
    Explore at:
    Dataset updated
    Jan 22, 2025
    Authors
    Nicolas Gaggion; Candelaria Mosquera; Martina Aineseder; Lucas Mansilla; Diego Milone; Enzo Ferrante
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CheXmask Database presents a comprehensive, uniformly annotated collection of chest radiographs, constructed from five public databases: ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest and VinDr-CXR. The database aggregates 657,566 anatomical segmentation masks derived from images which have been processed using the HybridGNet model to ensure consistent, high-quality segmentation. To confirm the quality of the segmentations, we include in this database individual Reverse Classification Accuracy (RCA) scores for each of the segmentation masks. This dataset is intended to catalyze further innovation and refinement in the field of semantic chest X-ray analysis, offering a significant resource for researchers in the medical imaging domain.

  20. program-cota-llava

    • huggingface.co
    Updated Jul 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salesforce (2025). program-cota-llava [Dataset]. https://huggingface.co/datasets/Salesforce/program-cota-llava
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2025
    Dataset provided by
    Salesforce Inchttp://salesforce.com/
    Authors
    Salesforce
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    ๐ŸŒฎ TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

    ๐ŸŒ Website | ๐Ÿ“‘ Arxiv | ๐Ÿ’ป Code| ๐Ÿค— Datasets

    If you like our project or are interested in its updates, please star us :) Thank you! โญ

      Summary
    

    TLDR: CoTA is a large-scale dataset of synthetic Chains-of-Thought-and-Action (CoTA) generated by programs.

      Load data
    

    from datasets import load_dataset dataset = load_dataset("Salesforce/program-cota-llava"โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/program-cota-llava.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Konstantinos Kousias (2022). A Large-Scale Dataset of 4G [Dataset]. https://ieee-dataport.org/documents/large-scale-dataset-4g-nb-iot-and-5g-non-standalone-network-measurements

A Large-Scale Dataset of 4G

NB-IoT

and 5G Non-Standalone Network Measurements

Explore at:
20 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 7, 2022
Authors
Konstantinos Kousias
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

it is crucial to examine them from an empirical perspective.

Search
Clear search
Close search
Google apps
Main menu