83 datasets found

d
Full Scale Viral Laundering Dataset
datasets.ai
catalog.data.gov
53
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2024). Full Scale Viral Laundering Dataset [Dataset]. https://datasets.ai/datasets/full-scale-viral-laundering-dataset
Explore at:
53Available download formats
Dataset updated
Oct 8, 2024
Dataset authored and provided by
U.S. Environmental Protection Agency
Description
This dataset contains disinfection efficacy of scrubs, face coverings, and denim contaminated by Phi6 and MS2 and cleaned using hot water laundering.

This dataset is associated with the following publication: Mikelonis, A., J. Archer, B. Wyrzykowska, E. Morris, J. Sawyer, T. Chamberlain, A. Abdel-Hady, M. Monge, and A. Touati. Determining Viral Disinfection Efficacy of Hot Water Laundering. Journal of Visualized Experiments. JoVE, Somerville, MA, USA, 184: e64164, (2022).
r
Dump truck object detection dataset including scale-models
demo.researchdata.se
researchdata.se
+1more
Updated May 8, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl Borngrund (2020). Dump truck object detection dataset including scale-models [Dataset]. http://doi.org/10.5878/8z9b-1718
Explore at:
Unique identifier
https://doi.org/10.5878/8z9b-1718
Dataset updated
May 8, 2020
Dataset provided by
Luleå University of Technology
Authors
Carl Borngrund
Description
Object detection is a vital part of any autonomous vision system and to obtain a high performing object detector data is needed. The object detection task aims to detect and classify different objects using camera input and getting bounding boxes containing the objects as output. This is usually done by utilizing deep neural networks.

When training an object detector a large amount of data is used, however it is not always practical to collect large amounts of data. This has led to multiple different techniques which decreases the amount of data needed. Examples of such techniques are transfer learning and domain adaptation. Working with construction equipment is a time consuming process and we wanted to examine if it was possible to use scale-model data to train a network and then used that network to detect real objects with no additional training.

This small dataset contains training and validation data of a scale dump truck in different environments while the test set contains images of a full size dump truck of similar model. The aim of the dataset is to train a network to classify wheels, cabs and tipping bodies of a scale-model dump truck and use that to classify the same classes on a full-scale dump truck.

The label structure of the dataset is the YOLO v3 structure, where the classes corresponds to a integer value, such that: Wheel: 0 Cab: 1 Tipping body: 2
Z
Data from: A Large-scale Dataset of (Open Source) License Text Variants
data.niaid.nih.gov
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163
Explore at:
Dataset updated
Mar 31, 2022
Dataset authored and provided by
Stefano Zacchiroli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

For more details see the included README file and companion paper:

Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

If you use this dataset for research purposes, please acknowledge its use by citing the above paper.
c
CAST Data Input Disaggregation from County and Land-River Segment Scale to...
s.cnmilf.com
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). CAST Data Input Disaggregation from County and Land-River Segment Scale to National Hydrography Dataset Plus, Version 1.1 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/cast-data-input-disaggregation-from-county-and-land-river-segment-scale-to-national-hydrog
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The detrimental effects of excess nutrients and sediment entering the Chesapeake Bay estuary from its watersheds have necessitated regulatory actions. Federally-mandated reductions are apportioned to bay jurisdictions based on the U.S. Environmental Protection Agency's Chesapeake Bay Time-Variable Watershed Model (CBPM). The Chesapeake Assessment Scenario Tool (CAST version CAST-19; cast.chesapeakebay.net; Chesapeake Bay Program, 2020) is a simplified, on-line version of the Phase 6 CBPM that simulates watershed nutrients delivery to the estuary using the original model's annual land-surface nutrient source and removal inputs and time-averaged climatological forecasting. Because it runs much faster than the CBPM, CAST facilitates rapid generation and comparison of alternate input reduction scenarios. The purpose of this data release is to make the baseline annual nitrogen, phosphorus, and sediment input data used by CAST available to the scientific community in a standardized, public-_domain format, such that CBPM baseline predictions can be corroborated, or the model can be refined through independent scientific investigations. Because it constitutes the best available estimate, as of 2019, of past and projected future land-surface nitrogen, phosphorus, and sediment inputs over the entire extent of the Chesapeake watershed, this data set also supports broader USGS Chesapeake Bay Studies through fiscal year 2025. Source-specific annual nutrient source and removal inputs for years 1985 through 2025 were downscaled from the CBPM land-river segment scale (2,049 segments; mean area 118 square kilometers) to the National Hydrography Dataset Plus version 2.0 (NHDPlus) 1:100,000 catchment scale (83,331 segments, mean area 2.1 square kilometers). Eleven source or removal categories are represented for all counties that intersect the Chesapeake Bay watershed. These categories are listed below and further defined in the Purpose section. 1. Atmospheric deposition (atm. dep.) 2. Biosolids 3. Combined sewer overflow (CSO) 4. Direct deposit (manure directly excreted on pasture and in streams) 5. Fertilizer 6. Manure applied as fertilizer 7. Nitrogen fixation by agricultural crops (Nfix) 8. Rapid infiltration basins (RIB) 9. Septic systems 10. Nutrient uptake by agricultural crops that is removed from the field 11. Wastewater For most of these categories, nutrient source and removal inputs are tabulated for five species: ammonia, nitrate, organic nitrogen, phosphate, and organic phosphorus; sediment inputs are provided as total suspended sediment. Consistent with CBPM, plant uptake is specified only as total nitrogen and total phosphorus, and wastewater inputs are specified as biological oxygen demand and dissolved oxygen (Chesapeake Bay Program, 2020). In addition to these sources, annual proportional land-use layers used in the downscaling process are provided, also at NHDPlus 1:100,000 scale. Layers for each year represent proportional coverage of 14 Chesapeake Bay 2013 1-meter Land Use Data classes, interpolated (1985-2013) based on evolution of land-cover derived from NLCD 1992, 2001, 2006, and 2011 layers, and projected (2014-2025) using land use estimated for 2025 using the USGS Chesapeake Bay Land Change model (USGS, 2020). Best management practices (BMPs) are not included in this data release. BMPs have varying effects on nutrient inputs and runoff. These effects are best represented in CAST. Moreover, the BMP history is regularly revised by the states and the most current history is available as a downloadable file from CAST. Chesapeake Bay Program, 2020. Chesapeake Assessment and Scenario Tool (CAST) Version 2019. Chesapeake Bay Program Office, Last accessed November 2021.
B
Data from: Species richness change across spatial scales
borealisdata.ca
open.library.ubc.ca
+5more
Updated May 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan M. Chase; Brian J. McGill; Patrick L. Thompson; Laura H. Antão; Amanda E. Bates; Shane A. Blowes; Maria Dornelas; Andrew Gonzalez; Anne E. Magurran; Sarah R. Supp; Marten Winter; Anne D. Bjorkmann; Helge Bruelheide; Jarrett E.K. Byrnes; Juliano Sarmento Cabral; Robin Ehali; Catalina Gomez; Hector M. Guzman; Forest Isbell; Isla H. Myers-Smith; Holly P. Jones; Jessica Hines; Mark Vellend; Conor Waldock; Mary O'Connor (2021). Data from: Species richness change across spatial scales [Dataset]. http://doi.org/10.5683/SP2/ZDF9RP
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/ZDF9RP
Dataset updated
May 19, 2021
Dataset provided by
Borealis
Authors
Jonathan M. Chase; Brian J. McGill; Patrick L. Thompson; Laura H. Antão; Amanda E. Bates; Shane A. Blowes; Maria Dornelas; Andrew Gonzalez; Anne E. Magurran; Sarah R. Supp; Marten Winter; Anne D. Bjorkmann; Helge Bruelheide; Jarrett E.K. Byrnes; Juliano Sarmento Cabral; Robin Ehali; Catalina Gomez; Hector M. Guzman; Forest Isbell; Isla H. Myers-Smith; Holly P. Jones; Jessica Hines; Mark Vellend; Conor Waldock; Mary O'Connor
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
AbstractHumans have elevated global extinction rates and thus lowered global-scale species richness. However, there is no a priori reason to expect that losses of global species richness should always, or even often, trickle down to losses of species richness at regional and local scales, even though this relationship is often assumed. Here, we show that scale can modulate our estimates of species richness change through time in the face of anthropogenic pressures, but not in a unidirectional way. Instead, the magnitude of species richness change through time can increase, decrease, reverse, or be unimodal across spatial scales. Using several case studies, we show different forms of scale-dependent richness change through time in the face of anthropogenic pressures. For example, Central American corals show a homogenization pattern, where small scale richness is largely unchanged through time, while larger scale richness change is highly negative. Alternatively, birds in North America showed a differentiation effect, where species richness was again largely unchanged through time at small scales, but was more positive at larger scales. Finally, we collated data from a heterogeneous set of studies of different taxa measured through time from sites ranging from small plots to entire continents, and found highly variable patterns that nevertheless imply complex scale-dependence in several taxa. In summary, understanding how biodiversity is changing in the Anthropocene requires an explicit recognition of the influence of spatial scale, and we conclude with some recommendations for how to better incorporate scale into our estimates of change. Usage notesdata_for_dryadThis file contains all data associated with the manuscript. A metadata file is included in the zip folder.
Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...
zenodo.org
explore.openaire.eu
bz2
Updated Mar 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2592524
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

This repository contains two files:

dump.tar.bz2

jupyter_reproducibility.tar.bz2

The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.

archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.

paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

Reproducing the Analysis

This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.11
Python 3.7.2
PdfCrop 2012/11/02 v1.38

First, download dump.tar.bz2 and extract it:

tar -xjf dump.tar.bz2

It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

psql jupyter < db2019-03-13.dump

It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Create a conda environment with Python 3.7:

conda create -n analyses python=3.7 conda activate analyses

Go to the analyses folder and install all the dependencies of the requirements.txt

cd jupyter_reproducibility/analyses pip install -r requirements.txt

For reproducing the analyses, run jupyter on this folder:

jupyter notebook

Execute the notebooks on this order:

Index.ipynb

N0.Repository.ipynb

N1.Skip.Notebook.ipynb

N2.Notebook.ipynb

N3.Cell.ipynb

N4.Features.ipynb

N5.Modules.ipynb

N6.AST.ipynb

N7.Name.ipynb

N8.Execution.ipynb

N9.Cell.Execution.Order.ipynb

N10.Markdown.ipynb

N11.Repository.With.Notebook.Restriction.ipynb

N12.To.Paper.ipynb

Reproducing or Expanding the Collection

The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

Requirements

This time, we have extra requirements:

All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account

Environment

First, set the following environment variables:

export JUP_MACHINE="db"; # machine identifier export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories export JUP_LOGS_DIR="/home/jupyter/logs"; # log files export JUP_COMPRESSION="lbzip2"; # compression program export JUP_VERBOSE="5"; # verbose level export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection export JUP_GITHUB_USERNAME="github_username"; # your github username export JUP_GITHUB_PASSWORD="github_password"; # your github password export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB) export JUP_FIRST_DATE="2013-01-01"; # initial date to query github export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address export JUP_EMAIL_TO="target@email.com"; # email that receives notifications export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank export JUP_WITH_EXECUTION="1"; # run execute python notebooks export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies export JUP_EXECUTION_MODE="-1"; # run following the execution order export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction # Frequenci of log report export JUP_ASTROID_FREQUENCY="5"; export JUP_IPYTHON_FREQUENCY="5"; export JUP_NOTEBOOKS_FREQUENCY="5"; export JUP_REQUIREMENT_FREQUENCY="5"; export JUP_CRAWLER_FREQUENCY="1"; export JUP_CLONE_FREQUENCY="1"; export JUP_COMPRESS_FREQUENCY="5"; export JUP_DB_IP="localhost"; # postgres database IP

Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

Scripts

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

Conda 2.7

conda create -n raw27 python=2.7 -y conda activate raw27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 2.7

conda create -n py27 python=2.7 anaconda -y conda activate py27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.4

It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

conda create -n raw34 python=3.4 -y conda activate raw34 conda install jupyter -c conda-forge -y conda uninstall jupyter -y pip install --upgrade pip pip install jupyter pip install pipenv pip install -e jupyter_reproducibility/archaeology pip install pathlib2

Anaconda 3.4

conda create -n py34 python=3.4 anaconda -y conda activate py34 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.5

conda create -n raw35 python=3.5 -y conda activate raw35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.5

It requires the manual installation of other anaconda packages.

conda create -n py35 python=3.5 anaconda -y conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator conda activate py35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.6

conda create -n raw36 python=3.6 -y conda activate raw36 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.6

conda create -n py36 python=3.6 anaconda -y conda activate py36 conda install -y anaconda-navigator jupyterlab_server navigator-updater pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.7

<code
Z
[Database] Urban Water Consumption at Multiple Spatial and Temporal Scales....
data.niaid.nih.gov
Updated Mar 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Castelletti Andrea (2021). [Database] Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4390459
Explore at:
Dataset updated
Mar 2, 2021
Dataset provided by
Di Nardo Armando
Cominola Andrea
Castelletti Andrea
Di Mauro Anna
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains the complete catalog of datasets and publications reviewed in: Di Mauro A., Cominola A., Castelletti A., Di Nardo A.. Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water 2021.The complete catalog contains:

92 state-of-the-art water demand datasets identified at the district, household, and end use scales;

120 related peer-reviewed publications;

57 additional datasets with electricity demand data at the end use and household scales.

The following metadata are reported, for each dataset:

Authors

Year

Location

Dataset Size

Time Series Length

Time Sampling Resolution

Access Policy.

The following metadata are reported, for each publication:

Authors

Year

Journal

Title

Spatial Scale

Type of Study: Survey (S) / Dataset (D)

Domain: Water (W)/Electricity (E)

Time Sampling Resolution

Access Policy

Dataset Size

Time Series Length

Location

Authors: Anna Di Mauro - Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | anna.dimauro@unicampania.it; Andrea Cominola - Chair of Smart Water Networks | Technische Universität Berlin - Einstein Center Digital Future (Germany) | andrea.cominola@tu-berlin.de; Andrea Castelletti - Department of Electronics, Information and Bioengineering | Politecnico di Milano (Italy) | andrea.castelletti@polimi.it Armando Di Nardo -Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | armando.dinardo@unicampania.it

Citation and reference:

If you use this database, please consider citing our paper

Di Mauro, A., Cominola, A., Castelletti, A., & Di Nardo, A. (2021). Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water, 13(1), 36, https://doi.org/10.3390/w13010036

Updates and Contributions:

The catalogue stored in this public repository can be collaboratively updated as more datasets become available. The authors will periodically update it to a new version.

New requests can be submitted to the authors, so that the dataset collection can be improved by different contributors. Contributors will be cited, step by step, in the updated versions of the dataset catalogue.

Updates history:

March 1st, 2021 - Pacheco, C.J.B., Horsburgh, J.S., Tracy, J.R. (Utah State University, Logan, UT - USA) --- The dataset associated with paper Bastidas Pacheco, C.J.; Horsburgh, J.S.; Tracy, R.J.. A Low-Cost, Open Source Monitoring System for Collecting High Temporal Resolution Water Use Data on Magnetically Driven Residential Water Meters. Sensors 2020, 20, 3655. is published in the HydroShare repository, where it is available as an OPEN dataset. Data can be found here: https://doi.org/10.4211/hs.4de42db6485f47b290bd9e17b017bb51
Z
Data from: YJMob100K: City-Scale and Longitudinal Dataset of Anonymized...
data.niaid.nih.gov
zenodo.org
Updated Apr 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shimizu, Toru (2024). YJMob100K: City-Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8111992
Explore at:
Dataset updated
Apr 21, 2024
Dataset provided by
Yabe, Takahiro
Tsubouchi, Kota
Shimizu, Toru
Moro, Esteban
Sekimoto, Yoshihide
Sezaki, Kaoru
Pentland, Alex
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The YJMob100K human mobility datasets (YJMob100K_dataset1.csv.gz and YJMob100K_dataset1.csv.gz) contain the movement of a total of 100,000 individuals across a 75 day period, discretized into 30-minute intervals and 500 meter grid cells. The first dataset contains the movement of 80,000 individuals across a 75-day business-as-usual period, while the second dataset contains the movement of 20,000 individuals across a 75-day period (including the last 15 days during an emergency) with unusual behavior.

While the name or location of the city is not disclosed, the participants are provided with points-of-interest (POIs; e.g., restaurants, parks) data for each grid cell (~85 dimensional vector) as supplementary information (cell_POIcat.csv.gz). The list of 85 POI categories can be found in POI_datacategories.csv.

For details of the dataset, see Data Descriptor:

Yabe, T., Tsubouchi, K., Shimizu, T., Sekimoto, Y., Sezaki, K., Moro, E., & Pentland, A. (2024). YJMob100K: City-scale and longitudinal dataset of anonymized human mobility trajectories. Scientific Data, 11(1), 397. https://www.nature.com/articles/s41597-024-03237-9

--- Details about the Human Mobility Prediction Challenge 2023 (ended November 13, 2023) ---

The challenge takes place in a mid-sized and highly populated metropolitan area, somewhere in Japan. The area is divided into 500 meters x 500 meters grid cells, resulting in a 200 x 200 grid cell space.

The human mobility datasets (task1_dataset.csv.gz and task2_dataset.csv.gz) contain the movement of a total of 100,000 individuals across a 90 day period, discretized into 30-minute intervals and 500 meter grid cells. The first dataset contains the movement of a 75 day business-as-usual period, while the second dataset contains the movement of a 75 day period during an emergency with unusual behavior.

There are 2 tasks in the Human Mobility Prediction Challenge.

In task 1, participants are provided with the full time series data (75 days) for 80,000 individuals, and partial (only 60 days) time series movement data for the remaining 20,000 individuals (task1_dataset.csv.gz). Given the provided data, Task 1 of the challenge is to predict the movement patterns of the individuals in the 20,000 individuals during days 60-74. Task 2 is similar task but uses a smaller dataset of 25,000 individuals in total, 2,500 of which have the locations during days 60-74 masked and need to be predicted (task2_dataset.csv.gz).

While the name or location of the city is not disclosed, the participants are provided with points-of-interest (POIs; e.g., restaurants, parks) data for each grid cell (~85 dimensional vector) as supplementary information (which is optional for use in the challenge) (cell_POIcat.csv.gz).

For more details, see https://connection.mit.edu/humob-challenge-2023
p
Data from: VinDr-Mammo: A large-scale benchmark dataset for computer-aided...
physionet.org
Updated Mar 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hieu Huy Pham; Hieu Nguyen Trung; Ha Quy Nguyen (2022). VinDr-Mammo: A large-scale benchmark dataset for computer-aided detection and diagnosis in full-field digital mammography [Dataset]. http://doi.org/10.13026/br2v-7517
Explore at:
Unique identifier
https://doi.org/10.13026/br2v-7517
Dataset updated
Mar 21, 2022
Authors
Hieu Huy Pham; Hieu Nguyen Trung; Ha Quy Nguyen
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Breast cancer is one of the most prevalent types of cancer and the leading type of cancer death. Mammography is the recommended imaging modality for periodic breast cancer screening. A few datasets have been published to develop computer-aided tools for mammography analysis. However, these datasets either have a limited sample size or consist of screen-film mammography (SFM), which have been replaced by full-field digital mammography (FFDM) in clinical practices. This project introduces a large-scale full-field digital mammography dataset of 5,000 four-view exams, which are double read by experienced mammographers to provide cancer assessment and breast density following the Breast Imaging Report and Data System (BI-RADS). Breast abnormalities that require further examination are also marked by bounding rectangles.
Reference Model 5 Full Scale Geometry (RM5: Oscillating Surge Flap)
mhkdr.openei.org
data.openei.org
+2more
archive, website
Updated Sep 30, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vince Neary; Yi-Hsiang Yu; Scott Jenne; Bob Thresher; Vince Neary; Yi-Hsiang Yu; Scott Jenne; Bob Thresher (2014). Reference Model 5 Full Scale Geometry (RM5: Oscillating Surge Flap) [Dataset]. http://doi.org/10.15473/1819897
Explore at:
website, archiveAvailable download formats
Unique identifier
https://doi.org/10.15473/1819897
Dataset updated
Sep 30, 2014
Dataset provided by
Sandia National Laboratorieshttp://www.sandia.gov/
United States Department of Energyhttp://energy.gov/
Marine and Hydrokinetic Data Repository
Authors
Vince Neary; Yi-Hsiang Yu; Scott Jenne; Bob Thresher; Vince Neary; Yi-Hsiang Yu; Scott Jenne; Bob Thresher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contains the Reference Model 5 (RM5) full scale geometry files of the Oscillating Surge Flap, developed by the Reference Model Project (RMP). These full scale geometry files are saved as SolidWorks assembly, IGS, and STEP files, and require a CAD program to view. This data was generated upon completion of the project on September 30, 2014.

The Reference Model Project (RMP), sponsored by the U.S. Department of Energy (DOE), was a partnered effort to develop open-source MHK point designs as reference models (RMs) to benchmark MHK technology performance and costs, and an open-source methodology for design and analysis of MHK technologies, including models for estimating their capital costs, operational costs, and levelized costs of energy. The point designs also served as open-source test articles for university researchers and commercial technology developers. The RMP project team, led by Sandia National Laboratories (SNL), included a partnership between DOE, three national laboratories, including the National Renewable Energy Laboratory (NREL), Pacific Northwest National Laboratory (PNNL), and Oak Ridge National Laboratory (ORNL), the Applied Research Laboratory of Penn State University, and Re Vision Consulting.

Reference Model 5 (RM5) is a type of floating, oscillating surge wave energy converter (OSWEC) that utilizes the surge motion of waves to generate electrical power. The reference wave energy resource for RM5 was measurement data from a National Data Buoy Center (NDBC) buoy near Eureka, in Humboldt County, California. The flap was designed to rotate against the supporting frame to convert wave energy into electrical power from the relative rotational motion induced by incoming waves. The RM5 design is rated at 360 kilowatts (kW), uses a flap of 25 m in width and 19 m in height (16 m in draft), and the distance from the top of the water surface piercing flap to the mean water surface (freeboard) is 1.5 m. The flap is connected to a shaft with a 3-m diameter that rotates against the supporting frame. The supporting frame is assumed to have an outer diameter of 2 m, and the total length of the device structure is 45 m. The RM5 OSWEC was designed for deep-water deployment, at depths between 50 m and 100 m, and was tension-moored to the seabed.
d
Data from: Evidence for general size-by-habitat rules in actinopterygian...
datadryad.org
data.niaid.nih.gov
+2more
zip
Updated Jun 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Clarke (2021). Evidence for general size-by-habitat rules in actinopterygian fishes across nine scales of observation [Dataset]. http://doi.org/10.5061/dryad.tb2rbnzxs
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tb2rbnzxs
Dataset updated
Jun 7, 2021
Dataset provided by
Dryad
Authors
John Clarke
Time period covered
2021
Description
Note: Many of these files also appear as supplementary files on the journal website. This provides an opportunity to provide all files associated with the paper in one place, alongside expanded descriptions of all files so that they are easier to navigate.

SI Text

Supplementary methods, results, and discussion.

SI Text Clarke 2021.pdf

SI Figures S1-S15

All 15 SI figures with captions.

SI Figures S1-S15 Clarke 2021.pdf

Fig. S1: Size distributions (log10 scale) for taxa in each habitat use across four datasets: (a) ‘FB11k dataset’; (b) ‘CoF11k dataset’; (c) ‘FB31k dataset’; (d) ‘CoF31k dataset’.

Fig. S2: Corresponding plot to main text Fig. 1 using FishBase 31k tree dataset.

Fig. S3: The percentage of groups where the phylogenetic mean size of taxa for one habitat use is larger than the other, obtained for every pairwise habitat-use comparison within all four datasets (FB11k, CoF11k, FB31k and CoF31k tree datasets).

Fig. S4: The per...
d
Data from: Dissolved inorganic carbon, total alkalinity, pH on total scale,...
catalog.data.gov
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact) (2025). Dissolved inorganic carbon, total alkalinity, pH on total scale, and other variables collected from profile and discrete sample observations using CTD, Niskin bottle, and other instruments from NOAA Ship Ronald H. Brown in the U.S. West Coast California Current System from 2016-05-08 to 2016-06-06 (NCEI Accession 0169412) [Dataset]. https://catalog.data.gov/dataset/dissolved-inorganic-carbon-total-alkalinity-ph-on-total-scale-and-other-variables-collected-fro3
Explore at:
Dataset updated
Jul 1, 2025
Dataset provided by
(Point of Contact)
Area covered
West Coast of the United States, United States
Description
This dataset contains the discrete carbon data collected during the 2016 West Coast Ocean Acidification (WCOA) cruise. WCOA2016 took place May 5 to June 7, 2016 aboard NOAA Ship Ronald H. Brown. It is the most integrated WCOA cruise so far, with 132 stations occupied from Baja California in Mexico to Vancouver Island in Canada along seventeen transect lines. At all stations, CTD casts were conducted, and discrete water samples were collected in Niskin bottles. The cruise was designed to obtain a synoptic snapshot of key carbon, physical, and biogeochemical parameters as they relate to ocean acidification (OA) in the coastal realm. Physical, biogeochemical, and chlorophyll concentration data collected during CTD casts are included with this data set. During the cruise, some of the same transect lines were occupied as during the 2007, 2011, 2012, and 2013 West Coast Ocean Acidification cruises, as well as CalCOFI cruises. This effort was conducted in support of the coastal monitoring and research objectives of the NOAA Ocean Acidification Program (OAP). Data Use Policy: Data from NOAA West Coast Ocean Acidification (WCOA) cruises are made freely available to the public and the scientific community in the belief that their wide dissemination will lead to greater understanding and new scientific and policy insights. The investigators sharing these data rely on the ethics and integrity of the user to ensure that the institutions and investigators involved in producing the WCOA cruise datasets receive fair credit for their work. If the data are obtained for potential use in a publication or presentation, we urge the end user to inform the investigators listed herein at the outset of the nature of this work. If these data are essential to the work, or if an important result or conclusion depends on these data, co-authorship may be appropriate. This should be discussed at an early stage in the work. We request that any manuscripts using these data be sent to all investigators listed in the metadata before they are submitted for publication so that we can ensure that the quality and limitations of the data are accurately represented. Please direct all queries about this dataset to Simone Alin and Richard Feely.
flux1.1-likert-scale-preference
huggingface.co
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rapidata (2025). flux1.1-likert-scale-preference [Dataset]. https://huggingface.co/datasets/Rapidata/flux1.1-likert-scale-preference
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2025
Dataset provided by
Rapidata AG
Authors
Rapidata
Description
Flux1.1 Likert Scale Text-to-Image Alignment Evaluation

This dataset contains images generated using Flux1.1 [pro] based on the prompts from our text-to-image generation benchmark. Where the benchmark generally focuses on pairwise comparisons to rank different image generation models against each other, this Likert-scale dataset focuses on one particular model and aims to reveal the particular nuances and highlight strong and weaks points of the model. If you get value from this… See the full description on the dataset page: https://huggingface.co/datasets/Rapidata/flux1.1-likert-scale-preference.
National Hydrography Dataset Plus Version 2.1
hub.arcgis.com
owdp-geo.hub.arcgis.com
+2more
Updated Aug 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2022). National Hydrography Dataset Plus Version 2.1 [Dataset]. https://hub.arcgis.com/maps/4bd9b6892530404abfe13645fcb5099a
Explore at:
Dataset updated
Aug 16, 2022
Dataset authored and provided by
Esrihttp://esri.com/
Area covered
Description
The National Hydrography Dataset Plus (NHDplus) maps the lakes, ponds, streams, rivers and other surface waters of the United States. Created by the US EPA Office of Water and the US Geological Survey, the NHDPlus provides mean annual and monthly flow estimates for rivers and streams. Additional attributes provide connections between features facilitating complicated analyses. For more information on the NHDPlus dataset see the NHDPlus v2 User Guide.Dataset SummaryPhenomenon Mapped: Surface waters and related features of the United States and associated territories not including Alaska.Geographic Extent: The United States not including Alaska, Puerto Rico, Guam, US Virgin Islands, Marshall Islands, Northern Marianas Islands, Palau, Federated States of Micronesia, and American SamoaProjection: Web Mercator Auxiliary Sphere Visible Scale: Visible at all scales but layer draws best at scales larger than 1:1,000,000Source: EPA and USGSUpdate Frequency: There is new new data since this 2019 version, so no updates planned in the futurePublication Date: March 13, 2019Prior to publication, the NHDPlus network and non-network flowline feature classes were combined into a single flowline layer. Similarly, the NHDPlus Area and Waterbody feature classes were merged under a single schema.Attribute fields were added to the flowline and waterbody layers to simplify symbology and enhance the layer's pop-ups. Fields added include Pop-up Title, Pop-up Subtitle, On or Off Network (flowlines only), Esri Symbology (waterbodies only), and Feature Code Description. All other attributes are from the original NHDPlus dataset. No data values -9999 and -9998 were converted to Null values for many of the flowline fields.What can you do with this layer?Feature layers work throughout the ArcGIS system. Generally your work flow with feature layers will begin in ArcGIS Online or ArcGIS Pro. Below are just a few of the things you can do with a feature service in Online and Pro.ArcGIS OnlineAdd this layer to a map in the map viewer. The layer is limited to scales of approximately 1:1,000,000 or larger but a vector tile layer created from the same data can be used at smaller scales to produce a webmap that displays across the full range of scales. The layer or a map containing it can be used in an application. Change the layer’s transparency and set its visibility rangeOpen the layer’s attribute table and make selections. Selections made in the map or table are reflected in the other. Center on selection allows you to zoom to features selected in the map or table and show selected records allows you to view the selected records in the table.Apply filters. For example you can set a filter to show larger streams and rivers using the mean annual flow attribute or the stream order attribute. Change the layer’s style and symbologyAdd labels and set their propertiesCustomize the pop-upUse as an input to the ArcGIS Online analysis tools. This layer works well as a reference layer with the trace downstream and watershed tools. The buffer tool can be used to draw protective boundaries around streams and the extract data tool can be used to create copies of portions of the data.ArcGIS ProAdd this layer to a 2d or 3d map. Use as an input to geoprocessing. For example, copy features allows you to select then export portions of the data to a new feature class. Change the symbology and the attribute field used to symbolize the dataOpen table and make interactive selections with the mapModify the pop-upsApply Definition Queries to create sub-sets of the layerThis layer is part of the ArcGIS Living Atlas of the World that provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
N
Scales Mound, IL Age Cohorts Dataset: Children, Working Adults, and Seniors...
neilsberg.com
csv, json
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Scales Mound, IL Age Cohorts Dataset: Children, Working Adults, and Seniors in Scales Mound - Population and Percentage Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/4ba23ee2-f122-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Scales Mound, Illinois
Variables measured
Population Over 65 Years, Population Under 18 Years, Population Between 18 and 64 Years, Percent of Total Population for Age Groups
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age cohorts. For age cohorts we divided it into three buckets Children ( Under the age of 18 years), working population ( Between 18 and 64 years) and senior population ( Over 65 years). For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Scales Mound population by age cohorts (Children: Under 18 years; Working population: 18-64 years; Senior population: 65 years or more). It lists the population in each age cohort group along with its percentage relative to the total population of Scales Mound. The dataset can be utilized to understand the population distribution across children, working population and senior population for dependency ratio, housing requirements, ageing, migration patterns etc.

Key observations

The largest age group was 18 to 64 years with a poulation of 212 (48.07% of the total population). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age cohorts:

Under 18 years

18 to 64 years

65 years and over

Variables / Data Columns

Age Group: This column displays the age cohort for the Scales Mound population analysis. Total expected values are 3 groups ( Children, Working Population and Senior Population).

Population: The population for the age cohort in Scales Mound is shown in the following column.

Percent of Total Population: The population as a percent of total population of the Scales Mound is shown in the following column.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Scales Mound Population by Age. You can refer the same here
d
Dissolved inorganic carbon, total alkalinity, pH on Total Scale, nutrients...
catalog.data.gov
datasets.ai
+1more
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact) (2025). Dissolved inorganic carbon, total alkalinity, pH on Total Scale, nutrients and other variables collected from profile and discrete sample observations using CTD, Niskin bottle and other instruments in the Gulf of Maine, Georges Bank, and Mid-Atlantic Bight from 2016-08-07 to 2016-11-11 (NCEI Accession 0169051) [Dataset]. https://catalog.data.gov/dataset/dissolved-inorganic-carbon-total-alkalinity-ph-on-total-scale-nutrients-and-other-variables-col69
Explore at:
Dataset updated
Jul 1, 2025
Dataset provided by
(Point of Contact)
Area covered
Gulf of Maine, Georges Bank
Description
This dataset contains dissolved inorganic carbon, Total alkalinity, pH on Total Scale, nutrients and other variables measured from profile discrete measurement in the Northeast coast of the US. Increasing amounts of atmospheric carbon dioxide from human industrial activities are causing changes in global ocean carbon chemistry resulting in a reduction in pH, a process termed ocean acidification. Studies have demonstrated adverse effects on calcifying organisms, particularly some invertebrates, corals, sea urchins, pteropods, and coccolithophores. This effort is in support of the coastal monitoring and research objectives of the NOAA Ocean Acidification Program (OAP).
a
NZ Active Fault Datasets
gwrc-open-data-11-1-gwrc.hub.arcgis.com
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greater Wellington Regional Council (2025). NZ Active Fault Datasets [Dataset]. https://gwrc-open-data-11-1-gwrc.hub.arcgis.com/maps/4ee736bfda9b4fd99067634d8433612d
Explore at:
Dataset updated
May 6, 2025
Dataset authored and provided by
Greater Wellington Regional Council
Area covered
New Zealand,
Description
The active fault data displayed here are from a variety of sources. It includes the New Zealand Active Faults Database (NZAFD) which comes in two versions - 1:250,000 scale (NZAFD-AF250) and a high-resolution scale (NZAFD-HighRes) – and is prepared by the Institute of Geological and Nuclear Sciences Limited (GNS Science). The active fault datasets also include Fault Avoidance Zones (FAZs) and Fault Awareness Areas (FAAs). The NZAFD-AF250 database covers New Zealand mainland, while the NZAFD-HighRes database, FAZs and FAAs are only available for restricted areas of New Zealand (updated periodically and without prior notification). If the FAZs are used to assist future land use planning, this should be done in accordance with the Ministry for the Environment "Planning for Development on or Close to Active Faults" (Kerr et al. 2003). The FAAs show where there may be a surface fault rupture hazard, but further work is needed to define a FAZ, and it is recommended that this dataset is used in conjunction with the guidelines developed by Barrell et al. (2015).The NZAFD is produced by GNS Science and represents the most current mapping of active faults for New Zealand in a single database. The NZAFD can be accessed on the GNS webmap via the link below.The NZAFD contains two distinct datasets based on scale:The high-resolution (NZAFD-HighRes) dataset (1:10,000 scale or better), designed for portrayal and use at cadastral (property) scale. This is currently only available to be viewed on the GNS webmap for some regions.The generalised (NZAFD-AF250) dataset, designed for portrayal and use at regional scale (1:250,000 scale). This can be viewed and downloaded on the GNS webmap for the entire country.Both datasets comprise polylines that represent the location of an active fault trace at or near the surface, at different scales. Each fault trace has attributes that describe its name, sense of movement, displacement, recurrence interval and other parameters.The high-resolution dataset group on the GNS webmap also includes two polygon layers derived from the NZAFD:Fault Avoidance Zones, which delineate areas of surface rupture hazard, as defined by the Ministry for the Environment Active Fault Guidelines (Kerr et al. 2003(external link)), or modifications thereof.Fault Awareness Areas, which highlight areas where a surface rupture hazard may exist (Barrell et al. 2015(external link)) and where more work is needed.
d
RANS Simulation RRF of Single Full Scale DOE RM1 MHK Turbine
catalog.data.gov
mhkdr.openei.org
+3more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Washington (2025). RANS Simulation RRF of Single Full Scale DOE RM1 MHK Turbine [Dataset]. https://catalog.data.gov/dataset/rans-simulation-rrf-of-single-full-scale-doe-rm1-mhk-turbine-7327a
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
University of Washington
Description
Attached are the .cas and .dat files for the Reynolds Averaged Navier-Stokes (RANS) simulation of a single full scale DOE RM1 turbine implemented in ANSYS FLUENT CFD-package. In this case study taking advantage of the symmetry of the DOE RM1 geometry, only half of the geometry is modeled using (Single) Rotating Reference Frame model [RRF]. In this model RANS equations, coupled with k-\omega turbulence closure model, are solved in the rotating reference frame. The actual geometry of the turbine blade is included and the turbulent boundary layer along the blade span is simulated using wall-function approach. The rotation of the blade is modeled by applying periodic boundary condition to sets of plane of symmetry. This case study simulates the performance and flow field in both the near and far wake of the device at the desired operating conditions. The results of these simulations showed good agreement to the only publicly available numerical simulation of the device done in the NREL. Please see the attached paper.
United States US: SPI: Pillar 1 Data Use Score: Scale 0-100
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, United States US: SPI: Pillar 1 Data Use Score: Scale 0-100 [Dataset]. https://www.ceicdata.com/en/united-states/governance-policy-and-institutions/us-spi-pillar-1-data-use-score-scale-0100
Explore at:
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2008 - Dec 1, 2019
Area covered
United States
Variables measured
Money Market Rate
Description
United States US: SPI: Pillar 1 Data Use Score: Scale 0-100 data was reported at 100.000 NA in 2019. This stayed constant from the previous number of 100.000 NA for 2018. United States US: SPI: Pillar 1 Data Use Score: Scale 0-100 data is updated yearly, averaging 60.000 NA from Dec 2004 (Median) to 2019, with 16 observations. The data reached an all-time high of 100.000 NA in 2019 and a record low of 40.000 NA in 2009. United States US: SPI: Pillar 1 Data Use Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Governance: Policy and Institutions. The data use overall score is a composite score measuring the demand side of the statistical system. The data use pillar is segmented by five types of users: (i) the legislature, (ii) the executive branch, (iii) civil society (including sub-national actors), (iv) academia and (v) international bodies. Each dimension would have associated indicators to measure performance. A mature system would score well across all dimensions whereas a less mature one would have weaker scores along certain dimensions. The gaps would give insights into prioritization among user groups and help answer questions as to why the existing services are not resulting in higher use of national statistics in a particular segment. Currently, the SPI only features indicators for one of the five dimensions of data use, which is data use by international organizations. Indicators on whether statistical systems are providing useful data to their national governments (legislature and executive branches), to civil society, and to academia are absent. Thus the dashboard does not yet assess if national statistical systems are meeting the data needs of a large swathe of users.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized...
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie; Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie (2021). PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids (Dataset) [Dataset]. http://doi.org/10.5281/zenodo.5130612
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5130612
Dataset updated
Nov 10, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie; Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The electric grid is a key enabling infrastructure for the ambitious transition towards carbon neutrality as we grapple with climate change. With deepening penetration of renewable energy resources and electrified transportation, the reliable and secure operation of the electric grid becomes increasingly challenging. In this paper, we present PSML, a first-of-its-kind open-access multi-scale time-series dataset, to aid in the development of data-driven machine learning (ML) based approaches towards reliable operation of future electric grids. The dataset is generated through a novel transmission + distribution (T+D) co-simulation designed to capture the increasingly important interactions and uncertainties of the grid dynamics, containing electric load, renewable generation, weather, voltage and current measurements at multiple spatio-temporal scales. Using PSML, we provide state-of-the-art ML baselines on three challenging use cases of critical importance to achieve: (i) early detection, accurate classification and localization of dynamic disturbance events; (ii) robust hierarchical forecasting of load and renewable energy with the presence of uncertainties and extreme events; and (iii) realistic synthetic generation of physical-law-constrained measurement time series. We envision that this dataset will enable advances for ML in dynamic systems, while simultaneously allowing ML researchers to contribute towards carbon-neutral electricity and mobility.

Data Navigation

Please download, unzip and put somewhere for later benchmark results reproduction and data loading and performance evaluation for proposed methods.

wget https://zenodo.org/record/5130612/files/PSML.zip?download=1 7z x 'PSML.zip?download=1' -o./

Minute-level Load and Renewable

File Name

ISO_zone_#.csv: `CAISO_zone_1.csv` contains minute-level load, renewable and weather data from 2018 to 2020 in the zone 1 of CAISO.

- Field Description

Field `time`: Time of minute resolution.

Field `load_power`: Normalized load power.

Field `wind_power`: Normalized wind turbine power.

Field `solar_power`: Normalized solar PV power.

Field `DHI`: Direct normal irradiance.

Field `DNI`: Diffuse horizontal irradiance.

Field `GHI`: Global horizontal irradiance.

Field `Dew Point`: Dew point in degree Celsius.

Field `Solar Zeinth Angle`: The angle between the sun's rays and the vertical direction in degree.

Field `Wind Speed`: Wind speed (m/s).

Field `Relative Humidity`: Relative humidity (%).

Field `Temperature`: Temperature in degree Celsius.

Minute-level PMU Measurements

File Name

case #: The `case 0` folder contains all data of scenario setting #0.

pf_input_#.txt: Selected load, renewable and solar generation for the simulation.

pf_result_#.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.

Filed Description

Field `time`: Time of minute resolution.

Field `Vm_###`: Voltage magnitude (p.u.) at the bus ### in the simulated model.

Field `Va_###`: Voltage angle (rad) at the bus ### in the simulated model.

Field `P_#_#_#`: `P_3_4_1` means the active power transferring in the #1 branch from the bus 3 to 4.

Field `Q_#_#_#`: `Q_5_20_1` means the reactive power transferring in the #1 branch from the bus 5 to 20.

Millisecond-level PMU Measurements

File Name

Forced Oscillation: The folder contains all forced oscillation cases.

row_#: The folder contains all data of the disturbance scenario #.

dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.

info.csv: This file contains the start time, end time, location and type of the disturbance

trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.

Natural Oscillation: The folder contains all natural oscillation cases.

row_#: The folder contains all data of the disturbance scenario #.

dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.

info.csv: This file contains the start time, end time, location and type of the disturbance.

trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.

Filed Description

trans.csv

- Field `Time(s)`: Time of millisecond resolution.

- Field `VOLT ###`: Voltage magnitude (p.u.) at the bus ### in the transmission model.

- Field `POWR ### TO ### CKT #`: `POWR 151 TO 152 CKT '1 '` means the active power transferring in the #1 branch from the bus 151 to 152.

- Field `VARS ### TO ### CKT #`: `VARS 151 TO 152 CKT '1 '` means the reactive power transferring in the #1 branch from the bus 151 to 152.

dist.csv

Field `Time(s)`: Time of millisecond resolution.

Field `####.###.#`: `3005.633.1` means per-unit voltage magnitude of the phase A at the bus 633 of the distribution grid, the one connecting to the bus 3005 in the transmission system.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Environmental Protection Agency (2024). Full Scale Viral Laundering Dataset [Dataset]. https://datasets.ai/datasets/full-scale-viral-laundering-dataset

Full Scale Viral Laundering Dataset

Explore at:

53Available download formats

Dataset updated

Oct 8, 2024

Dataset authored and provided by

U.S. Environmental Protection Agency

Description

This dataset contains disinfection efficacy of scrubs, face coverings, and denim contaminated by Phi6 and MS2 and cleaned using hot water laundering.

This dataset is associated with the following publication: Mikelonis, A., J. Archer, B. Wyrzykowska, E. Morris, J. Sawyer, T. Chamberlain, A. Abdel-Hady, M. Monge, and A. Touati. Determining Viral Disinfection Efficacy of Hot Water Laundering. Journal of Visualized Experiments. JoVE, Somerville, MA, USA, 184: e64164, (2022).

Clear search

Close search

Google apps

Main menu

Full Scale Viral Laundering Dataset

Dump truck object detection dataset including scale-models

Data from: A Large-scale Dataset of (Open Source) License Text Variants

CAST Data Input Disaggregation from County and Land-River Segment Scale to...

Data from: Species richness change across spatial scales

Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

[Database] Urban Water Consumption at Multiple Spatial and Temporal Scales....

Data from: YJMob100K: City-Scale and Longitudinal Dataset of Anonymized...

Data from: VinDr-Mammo: A large-scale benchmark dataset for computer-aided...

Reference Model 5 Full Scale Geometry (RM5: Oscillating Surge Flap)

Data from: Evidence for general size-by-habitat rules in actinopterygian...

Data from: Dissolved inorganic carbon, total alkalinity, pH on total scale,...

flux1.1-likert-scale-preference

National Hydrography Dataset Plus Version 2.1

Scales Mound, IL Age Cohorts Dataset: Children, Working Adults, and Seniors...

About this dataset

Content

Inspiration

Recommended for further research

Dissolved inorganic carbon, total alkalinity, pH on Total Scale, nutrients...

NZ Active Fault Datasets

RANS Simulation RRF of Single Full Scale DOE RM1 MHK Turbine

United States US: SPI: Pillar 1 Data Use Score: Scale 0-100

PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized...

Full Scale Viral Laundering Dataset