100+ datasets found
  1. d

    Data from: A Review of International Large-Scale Assessments in Education...

    • catalog.data.gov
    • datasets.ai
    Updated Mar 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2021). A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data [Dataset]. https://catalog.data.gov/dataset/a-review-of-international-large-scale-assessments-in-education-assessing-component-skills-
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    U.S. Department of State
    Description

    The OECD has initiated PISA for Development (PISA-D) in response to the rising need of developing countries to collect data about their education systems and the capacity of their student bodies. This report aims to compare and contrast approaches regarding the instruments that are used to collect data on (a) component skills and cognitive instruments, (b) contextual frameworks, and (c) the implementation of the different international assessments, as well as approaches to include children who are not at school, and the ways in which data are used. It then seeks to identify assessment practices in these three areas that will be useful for developing countries. This report reviews the major international and regional large-scale educational assessments: large-scale international surveys, school-based surveys and household-based surveys. For each of the issues discussed, there is a description of the prevailing international situation, followed by a consideration of the issue for developing countries and then a description of the relevance of the issue to PISA for Development.

  2. Z

    Quantitative raw data for "Large scale regional citizen surveys report"...

    • data.niaid.nih.gov
    Updated Feb 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Altsitsiadis, Efthymios (2022). Quantitative raw data for "Large scale regional citizen surveys report" (D1.4) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5958017
    Explore at:
    Dataset updated
    Feb 3, 2022
    Dataset provided by
    Bakratsas, Thomas
    Altsitsiadis, Efthymios
    Panori, Anastasia
    Chapizanis, Dimitrios
    Hauschildt, Christian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset presents the quantitative raw data that was collected under the H2020 RRI2SCALE project for the D1.4 - “Large scale regional citizen surveys report”. The dataset includes the answers that were provided by almost 8,000 participants from 4 pilot European regions (Kriti, Vestland, Galicia, and Overijssel) regarding the general public's views, concerns, and moral issues about the current and future trajectories of their RTD&I ecosystem. The original survey questionnaire was created by White Research SRL and disseminated to the regions through supporting pilot partners. Data collection took place from June 2020 to September 2020 through 4 different waves – one for each region. Based on the conclusion of a consortium vote during the kick-off meeting, it was decided that instead of resource-intensive methods that would render data collection unduly expensive, to fill in the quotas responses were collected through online panels by survey companies that were used for each region. For the statistical analysis of the data and the conclusions drawn from the analysis, you can access the "Large scale regional citizen surveys report" (D1.4).

  3. g

    A Large Scale Fish Dataset

    • gts.ai
    json
    Updated Mar 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). A Large Scale Fish Dataset [Dataset]. https://gts.ai/dataset-download/a-large-scale-fish-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Mar 20, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset was collected in order to carry out segmentation, feature extraction, and classification tasks and compare the common segmentation.

  4. d

    Data from: Crowdsourced geometric morphometrics enable rapid large-scale...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Nov 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Chang; Michael E. Alfaro (2016). Crowdsourced geometric morphometrics enable rapid large-scale collection and analysis of phenotypic data [Dataset]. http://doi.org/10.5061/dryad.gh4k7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 10, 2016
    Dataset provided by
    Dryad
    Authors
    Jonathan Chang; Michael E. Alfaro
    Time period covered
    2016
    Description
    1. Advances in genomics and informatics have enabled the production of large phylogenetic trees. However, the ability to collect large phenotypic datasets has not kept pace. 2. Here, we present a method to quickly and accurately gather morphometric data using crowdsourced image-based landmarking. 3. We find that crowdsourced workers perform similarly to experienced morphologists on the same digitization tasks. We also demonstrate the speed and accuracy of our method on seven families of ray-finned fishes (Actinopterygii). 4. Crowdsourcing will enable the collection of morphological data across vast radiations of organisms, and can facilitate richer inference on the macroevolutionary processes that shape phenotypic diversity across the tree of life.
  5. d

    Large Scale Topo Building (Polygon) (LGATE-139) - Datasets - data.wa.gov.au

    • catalogue.data.wa.gov.au
    Updated Jul 8, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Large Scale Topo Building (Polygon) (LGATE-139) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/large-scale-topo-building-polygon
    Explore at:
    Dataset updated
    Jul 8, 2019
    Area covered
    Western Australia
    Description

    A relatively permanent structure roofed and/or usually walled. Multiple points that describe the feature’s perimeter. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.

  6. N

    Data from: A large-scale study on the effects of sex on gray matter...

    • neurovault.org
    zip
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). A large-scale study on the effects of sex on gray matter asymmetry [Dataset]. http://identifiers.org/neurovault.collection:2825
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 27, 2025
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A collection of 5 brain maps. Each brain map is a 3D array of values representing properties of the brain at different locations.

    Collection description

    Statistical maps presented in the manuscript "A large-scale study on the effects of sex on gray matter asymmetry", published in Brain Structure and Function.

  7. Z

    Loghub-2.0: a collection of large-scale datasets for log parsing

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LogPAI (2024). Loghub-2.0: a collection of large-scale datasets for log parsing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8275860
    Explore at:
    Dataset updated
    Mar 3, 2024
    Dataset authored and provided by
    LogPAI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    See dataset details: https://github.com/logpai/Loghub-2.0

    The datasets are freely available for research or academic work, subject to the following condition: For any usage or distribution of the LogPub datasets, please refer to the LogPub repository URL (https://github.com/logpai/Loghub-2.0) and cite the LogPub paper (A Large-scale Evaluation for Log Parsing Techniques: How Far are We?) where applicable.

  8. TREC 2022 Deep Learning test collection

    • s.cnmilf.com
    • data.nist.gov
    • +1more
    Updated May 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). TREC 2022 Deep Learning test collection [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/trec-2022-deep-learning-test-collection
    Explore at:
    Dataset updated
    May 9, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.

  9. Enabling Complex Analysis of Large Scale Digital Collections - Phase II...

    • figshare.com
    pptx
    Updated Jan 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melissa Terras; James Baker; David Beavan; Martin Austwick; James Hetherington (2016). Enabling Complex Analysis of Large Scale Digital Collections - Phase II Pitch [Dataset]. http://doi.org/10.6084/m9.figshare.1481102.v1
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Jan 20, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Melissa Terras; James Baker; David Beavan; Martin Austwick; James Hetherington
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A pitch from UCLDH and the British Library on 14th July 2015 as part of the Jisc Research Data Spring program, phase II: reporting on our pilot project, and where we see the project going forward: Lots of money has been spent digitising heritage collections. Digitised heritage collections are data. But non-computationally trained scholars don't know what to ask of large quantities of data. Often they do not have access to high performance computing facilities and they don’t know how to use them. We have addressed this fundamental problem by extending research data management processes in order to enable novel research in the arts, humanities, and social and historical sciences and a deeper understanding of emerging research needs. In our first phase, we have successfully implemented large scale, complex search of a digitised collection: now we scale up…

  10. d

    Large Scale Topo Water (Point) (LGATE-168) - Datasets - data.wa.gov.au

    • catalogue.data.wa.gov.au
    Updated Jul 9, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Large Scale Topo Water (Point) (LGATE-168) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/large-scale-topo-water-point-lgate-168
    Explore at:
    Dataset updated
    Jul 9, 2019
    Area covered
    Western Australia
    Description

    Water features that relate to the interior of the country. A single point that describes a feature's location. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.

  11. f

    THINGS-data: Behavioral odd-one-out data and code

    • plus.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Hebart; Oliver Contier; Lina Teichmann; Adam Rockter; Charles Zheng; Alexis Kidder; Anna Corriveau; Maryam Vaziri-Pashkam; Chris Baker (2023). THINGS-data: Behavioral odd-one-out data and code [Dataset]. http://doi.org/10.25452/figshare.plus.20552784.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figshare+
    Authors
    Martin Hebart; Oliver Contier; Lina Teichmann; Adam Rockter; Charles Zheng; Alexis Kidder; Anna Corriveau; Maryam Vaziri-Pashkam; Chris Baker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    4.7 million object odd-one-out judgements from human participants on Amazon Mechanical Turk.

    Part of THINGS-data: A multimodal collection of large-scale datasets for investigating object representations in brain and behavior.

    See related materials in Collection at: https://doi.org/10.25452/figshare.plus.c.6161151

  12. s

    Statistical Area Large Scale 1000K 2014 - Datasets - This service has been...

    • store.smartdatahub.io
    Updated Nov 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Statistical Area Large Scale 1000K 2014 - Datasets - This service has been deprecated - please visit https://www.smartdatahub.io/ to access data. See the About page for details. // [Dataset]. https://store.smartdatahub.io/dataset/fi_tilastokeskus_tilastointialueet_suuralue1000k_2014
    Explore at:
    Dataset updated
    Nov 11, 2024
    Description

    This dataset collection comprises of related data tables sourced from the website of the Statistical Centre (Tilastokeskus) based in Finland. The information in this collection is derived from the Statistical Centre's service interface (WFS), providing a rich resource of data. Each table in the collection contains a set of related data, organized in a structured format of rows and columns. This dataset collection provides valuable insights and can be used for a variety of statistical analyses. This dataset is licensed under CC BY 4.0 (Creative Commons Attribution 4.0, https://creativecommons.org/licenses/by/4.0/deed.fi).

  13. D

    Drone Data Collection Service Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Drone Data Collection Service Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/drone-data-collection-service-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Drone Data Collection Service Market Outlook



    The global market size for drone data collection services was valued at approximately USD 5.5 billion in 2023 and is projected to reach USD 21.4 billion by 2032, growing at a robust CAGR of 16.1% during the forecast period. This significant growth can be attributed to the increasing demand for advanced data analytics and the need for efficient data collection methods across various industries.



    One of the major growth factors driving this market is the rapid advancement in drone technology. Innovations in drone hardware and software have significantly enhanced the capabilities of drones, making them more versatile and efficient in data collection tasks. Drones are now equipped with high-resolution cameras, LIDAR, and other advanced sensors that provide accurate and detailed data, which is invaluable for many industries. Additionally, improvements in battery life and flight stability have extended the operational range and endurance of drones, making them more practical for prolonged and large-scale data collection missions.



    Another critical factor fueling the market's growth is the increasing adoption of drones in various applications such as agriculture, construction, mining, and oil & gas. In agriculture, drones are used for precision farming, crop monitoring, and soil analysis, which help in optimizing yields and reducing costs. Similarly, in construction, drones are utilized for site surveying, progress monitoring, and safety inspections, which enhance project efficiency and safety. The mining industry also benefits from drone data collection for exploration, mapping, and monitoring of mining operations, ensuring better resource management and operational safety.



    The regulatory environment is another significant driver of market growth. Many countries are developing and implementing regulations that facilitate the integration of drones into commercial operations. These regulations are aimed at ensuring the safe and efficient use of drones while addressing privacy and security concerns. For instance, the Federal Aviation Administration (FAA) in the United States has established comprehensive guidelines for commercial drone operations, which have encouraged businesses to adopt drone technology for various data collection purposes.



    Regionally, the North American market is expected to dominate the global drone data collection service market, followed by Europe and Asia Pacific. North America’s dominance can be attributed to the presence of major drone technology companies, a favorable regulatory environment, and high adoption rates across various industries. The Asia Pacific region, with its rapidly growing economies and increasing investments in drone technology, is projected to witness the highest growth rate during the forecast period. Europe is also expected to see significant growth, driven by technological advancements and increasing demand for efficient data collection methods in industries such as agriculture and construction.



    Service Type Analysis



    The drone data collection service market can be segmented by service type into aerial photography, mapping & surveying, inspection & monitoring, and others. Aerial photography is one of the most commonly used services in this market. High-resolution aerial photographs captured by drones are utilized in various industries, including real estate, tourism, and media. These photographs provide detailed and accurate visual data that can be used for marketing, planning, and documentation purposes. The advancements in camera technology and drone stability have further enhanced the quality and reliability of aerial photography.



    Mapping & surveying is another critical segment in the drone data collection service market. Drones equipped with LIDAR, photogrammetry, and other advanced sensors are used to create detailed and accurate maps and surveys of large areas. This service is particularly beneficial in industries such as construction, mining, and agriculture, where precise data is crucial for planning and operational efficiency. The use of drones in mapping & surveying reduces the time and cost associated with traditional ground-based survey methods while providing high-quality and comprehensive data.



    Inspection & monitoring services provided by drones are increasingly being adopted in industries such as utilities, oil & gas, and infrastructure. Drones are used to inspect and monitor assets such as power lines, pipelines, and bridges, ensuring their integrity and safety. The ability of drones to acce

  14. Data from: A large-scale COVID-19 Twitter chatter dataset for open...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    application/gzip, csv +1
    Updated Apr 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding (2023). A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration [Dataset]. http://doi.org/10.5281/zenodo.3766929
    Explore at:
    application/gzip, csv, tsvAvailable download formats
    Dataset updated
    Apr 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding
    Description

    Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage.

    The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (230,961,781 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (52,026,197 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/

    More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688)

    As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. The need to be hydrated to be used.

  15. Index To The BGS Collection Of Large Scale Mine Plans & Land Survey Plans.

    • data.wu.ac.at
    • cloud.csiss.gmu.edu
    • +4more
    html
    Updated Aug 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    British Geological Survey (2018). Index To The BGS Collection Of Large Scale Mine Plans & Land Survey Plans. [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/MTA1MDU0ZGUtMTE0Mi00YzIyLWIxMDMtY2JmY2MzYjUxODEz
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Aug 18, 2018
    Dataset provided by
    British Geological Surveyhttps://www.bgs.ac.uk/
    Area covered
    846fd5a07a1a4880f5f31eb24dbc7aa4961160c7
    Description

    Index to the BGS collection of large scale or large format plans of all types including those relating to mining activity, including abandonment plans and site investigations. The Plans Database Index was set up c.1983 as a digital index to the collections of Land Survey Plans and Plans of Abandoned Mines. There are entries for all registered plans but not all the index fields are complete, as this depends on the nature of the original plan. The index covers the whole of Great Britain.

  16. Z

    HPC-ODA Dataset Collection

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Apr 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Netti, Alessio (2021). HPC-ODA Dataset Collection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3701439
    Explore at:
    Dataset updated
    Apr 9, 2021
    Dataset authored and provided by
    Netti, Alessio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HPC-ODA is a collection of datasets acquired on production HPC systems, which are representative of several real-world use cases in the field of Operational Data Analytics (ODA) for the improvement of reliability and energy efficiency. The datasets are composed of monitoring sensor data, acquired from the components of different HPC systems depending on the specific use case. Two tools, whose overhead is proven to be very light, were used to acquire data in HPC-ODA: these are the DCDB and LDMS monitoring frameworks.

    The aim of HPC-ODA is to provide several vertical slices (here named segments) of the monitoring data available in a large-scale HPC installation. The segments all have different granularities, in terms of data sources and time scale, and provide several use cases on which models and approaches to data processing can be evaluated. While having a production dataset from a whole HPC system - from the infrastructure down to the CPU core level - at a fine time granularity would be ideal, this is often not feasible due to the confidentiality of the data, as well as the sheer amount of storage space required. HPC-ODA includes 6 different segments:

    Power Consumption Prediction: a fine-granularity dataset that was collected from a single compute node in a HPC system. It contains both node-level data as well as per-CPU core metrics, and can be used to perform regression tasks such as power consumption prediction.

    Fault Detection: a medium-granularity dataset that was collected from a single compute node while it was subjected to fault injection. It contains only node-level data, as well as the labels for both the applications and faults being executed on the HPC node in time. This dataset can be used to perform fault classification.

    Application Classification: a medium-granularity dataset that was collected from 16 compute nodes in a HPC system while running different parallel MPI applications. Data is at the compute node level, separated for each of them, and is paired with the labels of the applications being executed. This dataset can be used for tasks such as application classification.

    Infrastructure Management: a coarse-granularity dataset containing cluster-wide data from a HPC system, about its warm water cooling system as well as power consumption. The data is at the rack level, and can be used for regression tasks such as outlet water temperature or removed heat prediction.

    Cross-architecture: a medium-granularity dataset that is a variant of the Application Classification one, and shares the same ODA use case. Here, however, single-node configurations of the applications were executed on three different compute node types with different CPU architectures. This dataset can be used to perform cross-architecture application classification, or performance comparison studies.

    DEEP-EST Dataset: this medium-granularity dataset was collected on the modular DEEP-EST HPC system and consists of three parts.These were collected on 16 compute nodes each, while running several MPI applications under different warm-water cooling configurations. This dataset can be used for CPU and GPU temperature prediction, or for thermal characterization.

    The HPC-ODA dataset collection includes a readme document containing all necessary usage information, as well as a lightweight Python framework to carry out the ODA tasks described for each dataset.

  17. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  18. Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

    • zenodo.org
    application/gzip
    Updated Mar 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks / Understanding and Improving the Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.3519618
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 16, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourages poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub. Based on the results, we proposed and evaluated Julynter, a linting tool for Jupyter Notebooks.

    Papers:

    This repository contains three files:

    Reproducing the Notebook Study

    The db2020-09-22.dump.gz file contains a PostgreSQL dump of the database, with all the data we extracted from notebooks. For loading it, run:

    gunzip -c db2020-09-22.dump.gz | psql jupyter

    Note that this file contains only the database with the extracted data. The actual repositories are available in a google drive folder, which also contains the docker images we used in the reproducibility study. The repositories are stored as content/{hash_dir1}/{hash_dir2}.tar.bz2, where hash_dir1 and hash_dir2 are columns of repositories in the database.

    For scripts, notebooks, and detailed instructions on how to analyze or reproduce the data collection, please check the instructions on the Jupyter Archaeology repository (tag 1.0.0)

    The sample.tar.gz file contains the repositories obtained during the manual sampling.

    Reproducing the Julynter Experiment

    The julynter_reproducility.tar.gz file contains all the data collected in the Julynter experiment and the analysis notebooks. Reproducing the analysis is straightforward:

    • Uncompress the file: $ tar zxvf julynter_reproducibility.tar.gz
    • Install the dependencies: $ pip install julynter/requirements.txt
    • Run the notebooks in order: J1.Data.Collection.ipynb; J2.Recommendations.ipynb; J3.Usability.ipynb.

    The collected data is stored in the julynter/data folder.

    Changelog

    2019/01/14 - Version 1 - Initial version
    2019/01/22 - Version 2 - Update N8.Execution.ipynb to calculate the rate of failure for each reason
    2019/03/13 - Version 3 - Update package for camera ready. Add columns to db to detect duplicates, change notebooks to consider them, and add N1.Skip.Notebook.ipynb and N11.Repository.With.Notebook.Restriction.ipynb.
    2021/03/15 - Version 4 - Add Julynter experiment; Update database dump to include new data collected for the second paper; remove scripts and analysis notebooks from this package (moved to GitHub), add a link to Google Drive with collected repository files

  19. d

    Photovoltaic Data Acquisition (PVDAQ) Public Datasets

    • catalog.data.gov
    • data.openei.org
    • +2more
    Updated Jun 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NREL (2025). Photovoltaic Data Acquisition (PVDAQ) Public Datasets [Dataset]. https://catalog.data.gov/dataset/photovoltaic-data-acquisition-pvdaq-public-datasets
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    NREL
    Description

    The NREL PVDAQ is a large-scale time-series database containing system metadata and performance data from a variety of experimental PV sites and commercial public PV sites. The datasets are used to perform on-going performance and degradation analysis. Some of the sets can exhibit common elements that effect PV performance (e.g. soiling). The dataset consists of a series of files devoted to each of the systems and an associated set of metadata information that explains details about the system hardware and the site geo-location. Some system datasets also include environmental sensors that cover irradiance, temperatures, wind speeds, and precipitation at the site.

  20. DOE EV Data Collection - Facility Data

    • osti.gov
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Transportation Office. Vehicle Technologies Office (EE-3V) (2025). DOE EV Data Collection - Facility Data [Dataset]. http://doi.org/10.15483/1989856
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Office of Energy Efficiency and Renewable Energyhttp://energy.gov/eere
    National Renewable Energy Laboratory
    Pacific Northwest National Laboratory
    Idaho National Laboratory
    Description

    Facility data includes information on electricity consumption by larger-scale infrastructure, including buildings, solar arrays, and energy storage systems. Parameter definitions can be found in the data dictionary. If a connection between specific vehicle information and facility data exists, it will be available in the vehicle attributes table. Vehicle ID can be used as a key between vehicle data and vehicle attribute tables. Data is being uploaded quarterly through 2023 and subject to change until the conclusion of the project.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Department of State (2021). A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data [Dataset]. https://catalog.data.gov/dataset/a-review-of-international-large-scale-assessments-in-education-assessing-component-skills-

Data from: A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data

Related Article
Explore at:
Dataset updated
Mar 30, 2021
Dataset provided by
U.S. Department of State
Description

The OECD has initiated PISA for Development (PISA-D) in response to the rising need of developing countries to collect data about their education systems and the capacity of their student bodies. This report aims to compare and contrast approaches regarding the instruments that are used to collect data on (a) component skills and cognitive instruments, (b) contextual frameworks, and (c) the implementation of the different international assessments, as well as approaches to include children who are not at school, and the ways in which data are used. It then seeks to identify assessment practices in these three areas that will be useful for developing countries. This report reviews the major international and regional large-scale educational assessments: large-scale international surveys, school-based surveys and household-based surveys. For each of the issues discussed, there is a description of the prevailing international situation, followed by a consideration of the issue for developing countries and then a description of the relevance of the issue to PISA for Development.

Search
Clear search
Close search
Google apps
Main menu