100+ datasets found

d
Data from: A Review of International Large-Scale Assessments in Education...
catalog.data.gov
datasets.ai
Updated Mar 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of State (2021). A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data [Dataset]. https://catalog.data.gov/dataset/a-review-of-international-large-scale-assessments-in-education-assessing-component-skills-
Explore at:
Dataset updated
Mar 30, 2021
Dataset provided by
U.S. Department of State
Description
The OECD has initiated PISA for Development (PISA-D) in response to the rising need of developing countries to collect data about their education systems and the capacity of their student bodies. This report aims to compare and contrast approaches regarding the instruments that are used to collect data on (a) component skills and cognitive instruments, (b) contextual frameworks, and (c) the implementation of the different international assessments, as well as approaches to include children who are not at school, and the ways in which data are used. It then seeks to identify assessment practices in these three areas that will be useful for developing countries. This report reviews the major international and regional large-scale educational assessments: large-scale international surveys, school-based surveys and household-based surveys. For each of the issues discussed, there is a description of the prevailing international situation, followed by a consideration of the issue for developing countries and then a description of the relevance of the issue to PISA for Development.
Z
Quantitative raw data for "Large scale regional citizen surveys report"...
data.niaid.nih.gov
Updated Feb 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Altsitsiadis, Efthymios (2022). Quantitative raw data for "Large scale regional citizen surveys report" (D1.4) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5958017
Explore at:
Dataset updated
Feb 3, 2022
Dataset provided by
Bakratsas, Thomas
Altsitsiadis, Efthymios
Panori, Anastasia
Chapizanis, Dimitrios
Hauschildt, Christian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset presents the quantitative raw data that was collected under the H2020 RRI2SCALE project for the D1.4 - “Large scale regional citizen surveys report”. The dataset includes the answers that were provided by almost 8,000 participants from 4 pilot European regions (Kriti, Vestland, Galicia, and Overijssel) regarding the general public's views, concerns, and moral issues about the current and future trajectories of their RTD&I ecosystem. The original survey questionnaire was created by White Research SRL and disseminated to the regions through supporting pilot partners. Data collection took place from June 2020 to September 2020 through 4 different waves – one for each region. Based on the conclusion of a consortium vote during the kick-off meeting, it was decided that instead of resource-intensive methods that would render data collection unduly expensive, to fill in the quotas responses were collected through online panels by survey companies that were used for each region. For the statistical analysis of the data and the conclusions drawn from the analysis, you can access the "Large scale regional citizen surveys report" (D1.4).
g
A Large Scale Fish Dataset
gts.ai
json
Updated Mar 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). A Large Scale Fish Dataset [Dataset]. https://gts.ai/dataset-download/a-large-scale-fish-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Mar 20, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset was collected in order to carry out segmentation, feature extraction, and classification tasks and compare the common segmentation.
d
Data from: Crowdsourced geometric morphometrics enable rapid large-scale...
datadryad.org
data.niaid.nih.gov
zip
Updated Nov 10, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Chang; Michael E. Alfaro (2016). Crowdsourced geometric morphometrics enable rapid large-scale collection and analysis of phenotypic data [Dataset]. http://doi.org/10.5061/dryad.gh4k7
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.gh4k7
Dataset updated
Nov 10, 2016
Dataset provided by
Dryad
Authors
Jonathan Chang; Michael E. Alfaro
Time period covered
2016
Description
Advances in genomics and informatics have enabled the production of large phylogenetic trees. However, the ability to collect large phenotypic datasets has not kept pace. 2. Here, we present a method to quickly and accurately gather morphometric data using crowdsourced image-based landmarking. 3. We find that crowdsourced workers perform similarly to experienced morphologists on the same digitization tasks. We also demonstrate the speed and accuracy of our method on seven families of ray-finned fishes (Actinopterygii). 4. Crowdsourcing will enable the collection of morphological data across vast radiations of organisms, and can facilitate richer inference on the macroevolutionary processes that shape phenotypic diversity across the tree of life.
d
Large Scale Topo Building (Polygon) (LGATE-139) - Datasets - data.wa.gov.au
catalogue.data.wa.gov.au
Updated Jul 8, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Large Scale Topo Building (Polygon) (LGATE-139) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/large-scale-topo-building-polygon
Explore at:
Dataset updated
Jul 8, 2019
Area covered
Western Australia
Description
A relatively permanent structure roofed and/or usually walled. Multiple points that describe the feature’s perimeter. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.
N
Data from: A large-scale study on the effects of sex on gray matter...
neurovault.org
zip
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). A large-scale study on the effects of sex on gray matter asymmetry [Dataset]. http://identifiers.org/neurovault.collection:2825
Explore at:
zipAvailable download formats
Unique identifier
https://identifiers.org/neurovault.collection:2825
Dataset updated
Mar 27, 2025
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A collection of 5 brain maps. Each brain map is a 3D array of values representing properties of the brain at different locations.

Collection description

Statistical maps presented in the manuscript "A large-scale study on the effects of sex on gray matter asymmetry", published in Brain Structure and Function.
Z
Loghub-2.0: a collection of large-scale datasets for log parsing
data.niaid.nih.gov
zenodo.org
Updated Mar 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LogPAI (2024). Loghub-2.0: a collection of large-scale datasets for log parsing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8275860
Explore at:
Dataset updated
Mar 3, 2024
Dataset authored and provided by
LogPAI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
See dataset details: https://github.com/logpai/Loghub-2.0

The datasets are freely available for research or academic work, subject to the following condition: For any usage or distribution of the LogPub datasets, please refer to the LogPub repository URL (https://github.com/logpai/Loghub-2.0) and cite the LogPub paper (A Large-scale Evaluation for Log Parsing Techniques: How Far are We?) where applicable.
TREC 2022 Deep Learning test collection
s.cnmilf.com
data.nist.gov
+1more
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). TREC 2022 Deep Learning test collection [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/trec-2022-deep-learning-test-collection
Explore at:
Dataset updated
May 9, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.
Enabling Complex Analysis of Large Scale Digital Collections - Phase II...
figshare.com
pptx
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melissa Terras; James Baker; David Beavan; Martin Austwick; James Hetherington (2016). Enabling Complex Analysis of Large Scale Digital Collections - Phase II Pitch [Dataset]. http://doi.org/10.6084/m9.figshare.1481102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1481102.v1
Dataset updated
Jan 20, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Melissa Terras; James Baker; David Beavan; Martin Austwick; James Hetherington
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A pitch from UCLDH and the British Library on 14th July 2015 as part of the Jisc Research Data Spring program, phase II: reporting on our pilot project, and where we see the project going forward: Lots of money has been spent digitising heritage collections. Digitised heritage collections are data. But non-computationally trained scholars don't know what to ask of large quantities of data. Often they do not have access to high performance computing facilities and they don’t know how to use them. We have addressed this fundamental problem by extending research data management processes in order to enable novel research in the arts, humanities, and social and historical sciences and a deeper understanding of emerging research needs. In our first phase, we have successfully implemented large scale, complex search of a digitised collection: now we scale up…
d
Large Scale Topo Water (Point) (LGATE-168) - Datasets - data.wa.gov.au
catalogue.data.wa.gov.au
Updated Jul 9, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Large Scale Topo Water (Point) (LGATE-168) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/large-scale-topo-water-point-lgate-168
Explore at:
Dataset updated
Jul 9, 2019
Area covered
Western Australia
Description
Water features that relate to the interior of the country. A single point that describes a feature's location. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.
f
THINGS-data: Behavioral odd-one-out data and code
plus.figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Hebart; Oliver Contier; Lina Teichmann; Adam Rockter; Charles Zheng; Alexis Kidder; Anna Corriveau; Maryam Vaziri-Pashkam; Chris Baker (2023). THINGS-data: Behavioral odd-one-out data and code [Dataset]. http://doi.org/10.25452/figshare.plus.20552784.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25452/figshare.plus.20552784.v1
Dataset updated
May 31, 2023
Dataset provided by
Figshare+
Authors
Martin Hebart; Oliver Contier; Lina Teichmann; Adam Rockter; Charles Zheng; Alexis Kidder; Anna Corriveau; Maryam Vaziri-Pashkam; Chris Baker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
4.7 million object odd-one-out judgements from human participants on Amazon Mechanical Turk.

Part of THINGS-data: A multimodal collection of large-scale datasets for investigating object representations in brain and behavior.

See related materials in Collection at: https://doi.org/10.25452/figshare.plus.c.6161151
s
Statistical Area Large Scale 1000K 2014 - Datasets - This service has been...
store.smartdatahub.io
Updated Nov 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Statistical Area Large Scale 1000K 2014 - Datasets - This service has been deprecated - please visit https://www.smartdatahub.io/ to access data. See the About page for details. // [Dataset]. https://store.smartdatahub.io/dataset/fi_tilastokeskus_tilastointialueet_suuralue1000k_2014
Explore at:
Dataset updated
Nov 11, 2024
Description
This dataset collection comprises of related data tables sourced from the website of the Statistical Centre (Tilastokeskus) based in Finland. The information in this collection is derived from the Statistical Centre's service interface (WFS), providing a rich resource of data. Each table in the collection contains a set of related data, organized in a structured format of rows and columns. This dataset collection provides valuable insights and can be used for a variety of statistical analyses. This dataset is licensed under CC BY 4.0 (Creative Commons Attribution 4.0, https://creativecommons.org/licenses/by/4.0/deed.fi).
D
Drone Data Collection Service Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Drone Data Collection Service Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/drone-data-collection-service-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Oct 16, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Drone Data Collection Service Market Outlook

The global market size for drone data collection services was valued at approximately USD 5.5 billion in 2023 and is projected to reach USD 21.4 billion by 2032, growing at a robust CAGR of 16.1% during the forecast period. This significant growth can be attributed to the increasing demand for advanced data analytics and the need for efficient data collection methods across various industries.

One of the major growth factors driving this market is the rapid advancement in drone technology. Innovations in drone hardware and software have significantly enhanced the capabilities of drones, making them more versatile and efficient in data collection tasks. Drones are now equipped with high-resolution cameras, LIDAR, and other advanced sensors that provide accurate and detailed data, which is invaluable for many industries. Additionally, improvements in battery life and flight stability have extended the operational range and endurance of drones, making them more practical for prolonged and large-scale data collection missions.

Another critical factor fueling the market's growth is the increasing adoption of drones in various applications such as agriculture, construction, mining, and oil & gas. In agriculture, drones are used for precision farming, crop monitoring, and soil analysis, which help in optimizing yields and reducing costs. Similarly, in construction, drones are utilized for site surveying, progress monitoring, and safety inspections, which enhance project efficiency and safety. The mining industry also benefits from drone data collection for exploration, mapping, and monitoring of mining operations, ensuring better resource management and operational safety.

The regulatory environment is another significant driver of market growth. Many countries are developing and implementing regulations that facilitate the integration of drones into commercial operations. These regulations are aimed at ensuring the safe and efficient use of drones while addressing privacy and security concerns. For instance, the Federal Aviation Administration (FAA) in the United States has established comprehensive guidelines for commercial drone operations, which have encouraged businesses to adopt drone technology for various data collection purposes.

Regionally, the North American market is expected to dominate the global drone data collection service market, followed by Europe and Asia Pacific. North America’s dominance can be attributed to the presence of major drone technology companies, a favorable regulatory environment, and high adoption rates across various industries. The Asia Pacific region, with its rapidly growing economies and increasing investments in drone technology, is projected to witness the highest growth rate during the forecast period. Europe is also expected to see significant growth, driven by technological advancements and increasing demand for efficient data collection methods in industries such as agriculture and construction.

Service Type Analysis

The drone data collection service market can be segmented by service type into aerial photography, mapping & surveying, inspection & monitoring, and others. Aerial photography is one of the most commonly used services in this market. High-resolution aerial photographs captured by drones are utilized in various industries, including real estate, tourism, and media. These photographs provide detailed and accurate visual data that can be used for marketing, planning, and documentation purposes. The advancements in camera technology and drone stability have further enhanced the quality and reliability of aerial photography.

Mapping & surveying is another critical segment in the drone data collection service market. Drones equipped with LIDAR, photogrammetry, and other advanced sensors are used to create detailed and accurate maps and surveys of large areas. This service is particularly beneficial in industries such as construction, mining, and agriculture, where precise data is crucial for planning and operational efficiency. The use of drones in mapping & surveying reduces the time and cost associated with traditional ground-based survey methods while providing high-quality and comprehensive data.

Inspection & monitoring services provided by drones are increasingly being adopted in industries such as utilities, oil & gas, and infrastructure. Drones are used to inspect and monitor assets such as power lines, pipelines, and bridges, ensuring their integrity and safety. The ability of drones to acce
Data from: A large-scale COVID-19 Twitter chatter dataset for open...
zenodo.org
explore.openaire.eu
+1more
application/gzip, csv +1
Updated Apr 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding (2023). A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration [Dataset]. http://doi.org/10.5281/zenodo.3766929
Explore at:
application/gzip, csv, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3766929
Dataset updated
Apr 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding
Description
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (230,961,781 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (52,026,197 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. The need to be hydrated to be used.
Index To The BGS Collection Of Large Scale Mine Plans & Land Survey Plans.
data.wu.ac.at
cloud.csiss.gmu.edu
+4more
html
Updated Aug 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
British Geological Survey (2018). Index To The BGS Collection Of Large Scale Mine Plans & Land Survey Plans. [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/MTA1MDU0ZGUtMTE0Mi00YzIyLWIxMDMtY2JmY2MzYjUxODEz
Explore at:
htmlAvailable download formats
Dataset updated
Aug 18, 2018
Dataset provided by
British Geological Surveyhttps://www.bgs.ac.uk/
Area covered
846fd5a07a1a4880f5f31eb24dbc7aa4961160c7
Description
Index to the BGS collection of large scale or large format plans of all types including those relating to mining activity, including abandonment plans and site investigations. The Plans Database Index was set up c.1983 as a digital index to the collections of Land Survey Plans and Plans of Abandoned Mines. There are entries for all registered plans but not all the index fields are complete, as this depends on the nature of the original plan. The index covers the whole of Great Britain.
Z
HPC-ODA Dataset Collection
data.niaid.nih.gov
explore.openaire.eu
Updated Apr 9, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Netti, Alessio (2021). HPC-ODA Dataset Collection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3701439
Explore at:
Dataset updated
Apr 9, 2021
Dataset authored and provided by
Netti, Alessio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
HPC-ODA is a collection of datasets acquired on production HPC systems, which are representative of several real-world use cases in the field of Operational Data Analytics (ODA) for the improvement of reliability and energy efficiency. The datasets are composed of monitoring sensor data, acquired from the components of different HPC systems depending on the specific use case. Two tools, whose overhead is proven to be very light, were used to acquire data in HPC-ODA: these are the DCDB and LDMS monitoring frameworks.

The aim of HPC-ODA is to provide several vertical slices (here named segments) of the monitoring data available in a large-scale HPC installation. The segments all have different granularities, in terms of data sources and time scale, and provide several use cases on which models and approaches to data processing can be evaluated. While having a production dataset from a whole HPC system - from the infrastructure down to the CPU core level - at a fine time granularity would be ideal, this is often not feasible due to the confidentiality of the data, as well as the sheer amount of storage space required. HPC-ODA includes 6 different segments:

Power Consumption Prediction: a fine-granularity dataset that was collected from a single compute node in a HPC system. It contains both node-level data as well as per-CPU core metrics, and can be used to perform regression tasks such as power consumption prediction.

Fault Detection: a medium-granularity dataset that was collected from a single compute node while it was subjected to fault injection. It contains only node-level data, as well as the labels for both the applications and faults being executed on the HPC node in time. This dataset can be used to perform fault classification.

Application Classification: a medium-granularity dataset that was collected from 16 compute nodes in a HPC system while running different parallel MPI applications. Data is at the compute node level, separated for each of them, and is paired with the labels of the applications being executed. This dataset can be used for tasks such as application classification.

Infrastructure Management: a coarse-granularity dataset containing cluster-wide data from a HPC system, about its warm water cooling system as well as power consumption. The data is at the rack level, and can be used for regression tasks such as outlet water temperature or removed heat prediction.

Cross-architecture: a medium-granularity dataset that is a variant of the Application Classification one, and shares the same ODA use case. Here, however, single-node configurations of the applications were executed on three different compute node types with different CPU architectures. This dataset can be used to perform cross-architecture application classification, or performance comparison studies.

DEEP-EST Dataset: this medium-granularity dataset was collected on the modular DEEP-EST HPC system and consists of three parts.These were collected on 16 compute nodes each, while running several MPI applications under different warm-water cooling configurations. This dataset can be used for CPU and GPU temperature prediction, or for thermal characterization.

The HPC-ODA dataset collection includes a readme document containing all necessary usage information, as well as a lightweight Python framework to carry out the ODA tasks described for each dataset.
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
data.europa.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...
zenodo.org
application/gzip
Updated Mar 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks / Understanding and Improving the Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.3519618
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3519618
Dataset updated
Mar 16, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourages poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub. Based on the results, we proposed and evaluated Julynter, a linting tool for Jupyter Notebooks.

Papers:

PIMENTEL, J. F.; MURTA, L.; BRAGANHOLO, V.; FREIRE, J.; A large-scale study about quality and reproducibility of jupyter notebooks. In: International Conference on Mining Software Repositories (MSR), 2019, Montreal, Canada.

PIMENTEL, J. F.; MURTA, L.; BRAGANHOLO, V.; FREIRE, J.; Understanding and Improving the Quality and Reproducibility of Jupyter Notebooks. Empirical Software Engineering, 2021 (in press)

This repository contains three files:

db2020-09-22.dump.gz

sample.tar.gz

julynter_reproducility.tar.gz

Reproducing the Notebook Study

The db2020-09-22.dump.gz file contains a PostgreSQL dump of the database, with all the data we extracted from notebooks. For loading it, run:

gunzip -c db2020-09-22.dump.gz | psql jupyter

Note that this file contains only the database with the extracted data. The actual repositories are available in a google drive folder, which also contains the docker images we used in the reproducibility study. The repositories are stored as content/{hash_dir1}/{hash_dir2}.tar.bz2, where hash_dir1 and hash_dir2 are columns of repositories in the database.

For scripts, notebooks, and detailed instructions on how to analyze or reproduce the data collection, please check the instructions on the Jupyter Archaeology repository (tag 1.0.0)

The sample.tar.gz file contains the repositories obtained during the manual sampling.

Reproducing the Julynter Experiment

The julynter_reproducility.tar.gz file contains all the data collected in the Julynter experiment and the analysis notebooks. Reproducing the analysis is straightforward:

Uncompress the file: $ tar zxvf julynter_reproducibility.tar.gz

Install the dependencies: $ pip install julynter/requirements.txt

Run the notebooks in order: J1.Data.Collection.ipynb; J2.Recommendations.ipynb; J3.Usability.ipynb.

The collected data is stored in the julynter/data folder.

Changelog

2019/01/14 - Version 1 - Initial version
2019/01/22 - Version 2 - Update N8.Execution.ipynb to calculate the rate of failure for each reason
2019/03/13 - Version 3 - Update package for camera ready. Add columns to db to detect duplicates, change notebooks to consider them, and add N1.Skip.Notebook.ipynb and N11.Repository.With.Notebook.Restriction.ipynb.
2021/03/15 - Version 4 - Add Julynter experiment; Update database dump to include new data collected for the second paper; remove scripts and analysis notebooks from this package (moved to GitHub), add a link to Google Drive with collected repository files
d
Photovoltaic Data Acquisition (PVDAQ) Public Datasets
catalog.data.gov
data.openei.org
+2more
Updated Jun 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NREL (2025). Photovoltaic Data Acquisition (PVDAQ) Public Datasets [Dataset]. https://catalog.data.gov/dataset/photovoltaic-data-acquisition-pvdaq-public-datasets
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
NREL
Description
The NREL PVDAQ is a large-scale time-series database containing system metadata and performance data from a variety of experimental PV sites and commercial public PV sites. The datasets are used to perform on-going performance and degradation analysis. Some of the sets can exhibit common elements that effect PV performance (e.g. soiling). The dataset consists of a series of files devoted to each of the systems and an associated set of metadata information that explains details about the system hardware and the site geo-location. Some system datasets also include environmental sensors that cover irradiance, temperatures, wind speeds, and precipitation at the site.
DOE EV Data Collection - Facility Data
osti.gov
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Transportation Office. Vehicle Technologies Office (EE-3V) (2025). DOE EV Data Collection - Facility Data [Dataset]. http://doi.org/10.15483/1989856
Explore at:
Unique identifier
https://doi.org/10.15483/1989856
Dataset updated
Jul 30, 2025
Dataset provided by
United States Department of Energyhttp://energy.gov/
Office of Energy Efficiency and Renewable Energyhttp://energy.gov/eere
National Renewable Energy Laboratory
Pacific Northwest National Laboratory
Idaho National Laboratory
Description
Facility data includes information on electricity consumption by larger-scale infrastructure, including buildings, solar arrays, and energy storage systems. Parameter definitions can be found in the data dictionary. If a connection between specific vehicle information and facility data exists, it will be available in the vehicle attributes table. Vehicle ID can be used as a key between vehicle data and vehicle attribute tables. Data is being uploaded quarterly through 2023 and subject to change until the conclusion of the project.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Department of State (2021). A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data [Dataset]. https://catalog.data.gov/dataset/a-review-of-international-large-scale-assessments-in-education-assessing-component-skills-

Data from: A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data

Explore at:

Dataset updated

Mar 30, 2021

Dataset provided by

U.S. Department of State

Description

The OECD has initiated PISA for Development (PISA-D) in response to the rising need of developing countries to collect data about their education systems and the capacity of their student bodies. This report aims to compare and contrast approaches regarding the instruments that are used to collect data on (a) component skills and cognitive instruments, (b) contextual frameworks, and (c) the implementation of the different international assessments, as well as approaches to include children who are not at school, and the ways in which data are used. It then seeks to identify assessment practices in these three areas that will be useful for developing countries. This report reviews the major international and regional large-scale educational assessments: large-scale international surveys, school-based surveys and household-based surveys. For each of the issues discussed, there is a description of the prevailing international situation, followed by a consideration of the issue for developing countries and then a description of the relevance of the issue to PISA for Development.

Clear search

Close search

Google apps

Main menu

Data from: A Review of International Large-Scale Assessments in Education...

Quantitative raw data for "Large scale regional citizen surveys report"...

A Large Scale Fish Dataset

Data from: Crowdsourced geometric morphometrics enable rapid large-scale...

Large Scale Topo Building (Polygon) (LGATE-139) - Datasets - data.wa.gov.au

Data from: A large-scale study on the effects of sex on gray matter...

Collection description

Loghub-2.0: a collection of large-scale datasets for log parsing

TREC 2022 Deep Learning test collection

Enabling Complex Analysis of Large Scale Digital Collections - Phase II...

Large Scale Topo Water (Point) (LGATE-168) - Datasets - data.wa.gov.au

THINGS-data: Behavioral odd-one-out data and code

Statistical Area Large Scale 1000K 2014 - Datasets - This service has been...

Drone Data Collection Service Market Report | Global Forecast From 2025 To...

Drone Data Collection Service Market Outlook

Service Type Analysis

Data from: A large-scale COVID-19 Twitter chatter dataset for open...

Index To The BGS Collection Of Large Scale Mine Plans & Land Survey Plans.

HPC-ODA Dataset Collection

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

Photovoltaic Data Acquisition (PVDAQ) Public Datasets

DOE EV Data Collection - Facility Data

Data from: A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data