This dataset contains disinfection efficacy of scrubs, face coverings, and denim contaminated by Phi6 and MS2 and cleaned using hot water laundering.
This dataset is associated with the following publication: Mikelonis, A., J. Archer, B. Wyrzykowska, E. Morris, J. Sawyer, T. Chamberlain, A. Abdel-Hady, M. Monge, and A. Touati. Determining Viral Disinfection Efficacy of Hot Water Laundering. Journal of Visualized Experiments. JoVE, Somerville, MA, USA, 184: e64164, (2022).
Object detection is a vital part of any autonomous vision system and to obtain a high performing object detector data is needed. The object detection task aims to detect and classify different objects using camera input and getting bounding boxes containing the objects as output. This is usually done by utilizing deep neural networks.
When training an object detector a large amount of data is used, however it is not always practical to collect large amounts of data. This has led to multiple different techniques which decreases the amount of data needed. Examples of such techniques are transfer learning and domain adaptation. Working with construction equipment is a time consuming process and we wanted to examine if it was possible to use scale-model data to train a network and then used that network to detect real objects with no additional training.
This small dataset contains training and validation data of a scale dump truck in different environments while the test set contains images of a full size dump truck of similar model. The aim of the dataset is to train a network to classify wheels, cabs and tipping bodies of a scale-model dump truck and use that to classify the same classes on a full-scale dump truck.
The label structure of the dataset is the YOLO v3 structure, where the classes corresponds to a integer value, such that: Wheel: 0 Cab: 1 Tipping body: 2
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.
For more details see the included README file and companion paper:
Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.
If you use this dataset for research purposes, please acknowledge its use by citing the above paper.
The detrimental effects of excess nutrients and sediment entering the Chesapeake Bay estuary from its watersheds have necessitated regulatory actions. Federally-mandated reductions are apportioned to bay jurisdictions based on the U.S. Environmental Protection Agency's Chesapeake Bay Time-Variable Watershed Model (CBPM). The Chesapeake Assessment Scenario Tool (CAST version CAST-19; cast.chesapeakebay.net; Chesapeake Bay Program, 2020) is a simplified, on-line version of the Phase 6 CBPM that simulates watershed nutrients delivery to the estuary using the original model's annual land-surface nutrient source and removal inputs and time-averaged climatological forecasting. Because it runs much faster than the CBPM, CAST facilitates rapid generation and comparison of alternate input reduction scenarios. The purpose of this data release is to make the baseline annual nitrogen, phosphorus, and sediment input data used by CAST available to the scientific community in a standardized, public-_domain format, such that CBPM baseline predictions can be corroborated, or the model can be refined through independent scientific investigations. Because it constitutes the best available estimate, as of 2019, of past and projected future land-surface nitrogen, phosphorus, and sediment inputs over the entire extent of the Chesapeake watershed, this data set also supports broader USGS Chesapeake Bay Studies through fiscal year 2025. Source-specific annual nutrient source and removal inputs for years 1985 through 2025 were downscaled from the CBPM land-river segment scale (2,049 segments; mean area 118 square kilometers) to the National Hydrography Dataset Plus version 2.0 (NHDPlus) 1:100,000 catchment scale (83,331 segments, mean area 2.1 square kilometers). Eleven source or removal categories are represented for all counties that intersect the Chesapeake Bay watershed. These categories are listed below and further defined in the Purpose section. 1. Atmospheric deposition (atm. dep.) 2. Biosolids 3. Combined sewer overflow (CSO) 4. Direct deposit (manure directly excreted on pasture and in streams) 5. Fertilizer 6. Manure applied as fertilizer 7. Nitrogen fixation by agricultural crops (Nfix) 8. Rapid infiltration basins (RIB) 9. Septic systems 10. Nutrient uptake by agricultural crops that is removed from the field 11. Wastewater For most of these categories, nutrient source and removal inputs are tabulated for five species: ammonia, nitrate, organic nitrogen, phosphate, and organic phosphorus; sediment inputs are provided as total suspended sediment. Consistent with CBPM, plant uptake is specified only as total nitrogen and total phosphorus, and wastewater inputs are specified as biological oxygen demand and dissolved oxygen (Chesapeake Bay Program, 2020). In addition to these sources, annual proportional land-use layers used in the downscaling process are provided, also at NHDPlus 1:100,000 scale. Layers for each year represent proportional coverage of 14 Chesapeake Bay 2013 1-meter Land Use Data classes, interpolated (1985-2013) based on evolution of land-cover derived from NLCD 1992, 2001, 2006, and 2011 layers, and projected (2014-2025) using land use estimated for 2025 using the USGS Chesapeake Bay Land Change model (USGS, 2020). Best management practices (BMPs) are not included in this data release. BMPs have varying effects on nutrient inputs and runoff. These effects are best represented in CAST. Moreover, the BMP history is regularly revised by the states and the most current history is available as a downloadable file from CAST. Chesapeake Bay Program, 2020. Chesapeake Assessment and Scenario Tool (CAST) Version 2019. Chesapeake Bay Program Office, Last accessed November 2021.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
AbstractHumans have elevated global extinction rates and thus lowered global-scale species richness. However, there is no a priori reason to expect that losses of global species richness should always, or even often, trickle down to losses of species richness at regional and local scales, even though this relationship is often assumed. Here, we show that scale can modulate our estimates of species richness change through time in the face of anthropogenic pressures, but not in a unidirectional way. Instead, the magnitude of species richness change through time can increase, decrease, reverse, or be unimodal across spatial scales. Using several case studies, we show different forms of scale-dependent richness change through time in the face of anthropogenic pressures. For example, Central American corals show a homogenization pattern, where small scale richness is largely unchanged through time, while larger scale richness change is highly negative. Alternatively, birds in North America showed a differentiation effect, where species richness was again largely unchanged through time at small scales, but was more positive at larger scales. Finally, we collated data from a heterogeneous set of studies of different taxa measured through time from sites ranging from small plots to entire continents, and found highly variable patterns that nevertheless imply complex scale-dependence in several taxa. In summary, understanding how biodiversity is changing in the Anthropocene requires an explicit recognition of the influence of spatial scale, and we conclude with some recommendations for how to better incorporate scale into our estimates of change. Usage notesdata_for_dryadThis file contains all data associated with the manuscript. A metadata file is included in the zip folder.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.
This repository contains two files:
The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.
The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:
In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.
Reproducing the Analysis
This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:
Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.11
Python 3.7.2
PdfCrop 2012/11/02 v1.38
First, download dump.tar.bz2 and extract it:
tar -xjf dump.tar.bz2
It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:
psql jupyter < db2019-03-13.dump
It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:
export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";
Download and extract jupyter_reproducibility.tar.bz2:
tar -xjf jupyter_reproducibility.tar.bz2
Create a conda environment with Python 3.7:
conda create -n analyses python=3.7
conda activate analyses
Go to the analyses folder and install all the dependencies of the requirements.txt
cd jupyter_reproducibility/analyses
pip install -r requirements.txt
For reproducing the analyses, run jupyter on this folder:
jupyter notebook
Execute the notebooks on this order:
Reproducing or Expanding the Collection
The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.
Requirements
This time, we have extra requirements:
All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account
Environment
First, set the following environment variables:
export JUP_MACHINE="db"; # machine identifier
export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
export JUP_COMPRESSION="lbzip2"; # compression program
export JUP_VERBOSE="5"; # verbose level
export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
export JUP_GITHUB_USERNAME="github_username"; # your github username
export JUP_GITHUB_PASSWORD="github_password"; # your github password
export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
export JUP_WITH_EXECUTION="1"; # run execute python notebooks
export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
export JUP_EXECUTION_MODE="-1"; # run following the execution order
export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
# Frequenci of log report
export JUP_ASTROID_FREQUENCY="5";
export JUP_IPYTHON_FREQUENCY="5";
export JUP_NOTEBOOKS_FREQUENCY="5";
export JUP_REQUIREMENT_FREQUENCY="5";
export JUP_CRAWLER_FREQUENCY="1";
export JUP_CLONE_FREQUENCY="1";
export JUP_COMPRESS_FREQUENCY="5";
export JUP_DB_IP="localhost"; # postgres database IP
Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf
Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.
Scripts
Download and extract jupyter_reproducibility.tar.bz2:
tar -xjf jupyter_reproducibility.tar.bz2
Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):
Conda 2.7
conda create -n raw27 python=2.7 -y
conda activate raw27
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 2.7
conda create -n py27 python=2.7 anaconda -y
conda activate py27
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.4
It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.
conda create -n raw34 python=3.4 -y
conda activate raw34
conda install jupyter -c conda-forge -y
conda uninstall jupyter -y
pip install --upgrade pip
pip install jupyter
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
pip install pathlib2
Anaconda 3.4
conda create -n py34 python=3.4 anaconda -y
conda activate py34
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.5
conda create -n raw35 python=3.5 -y
conda activate raw35
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 3.5
It requires the manual installation of other anaconda packages.
conda create -n py35 python=3.5 anaconda -y
conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
conda activate py35
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.6
conda create -n raw36 python=3.6 -y
conda activate raw36
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 3.6
conda create -n py36 python=3.6 anaconda -y
conda activate py36
conda install -y anaconda-navigator jupyterlab_server navigator-updater
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.7
<code
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the complete catalog of datasets and publications reviewed in: Di Mauro A., Cominola A., Castelletti A., Di Nardo A.. Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water 2021.The complete catalog contains:
92 state-of-the-art water demand datasets identified at the district, household, and end use scales;
120 related peer-reviewed publications;
57 additional datasets with electricity demand data at the end use and household scales.
The following metadata are reported, for each dataset:
Authors
Year
Location
Dataset Size
Time Series Length
Time Sampling Resolution
Access Policy.
The following metadata are reported, for each publication:
Authors
Year
Journal
Title
Spatial Scale
Type of Study: Survey (S) / Dataset (D)
Domain: Water (W)/Electricity (E)
Time Sampling Resolution
Access Policy
Dataset Size
Time Series Length
Location
Authors: Anna Di Mauro - Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | anna.dimauro@unicampania.it; Andrea Cominola - Chair of Smart Water Networks | Technische Universität Berlin - Einstein Center Digital Future (Germany) | andrea.cominola@tu-berlin.de; Andrea Castelletti - Department of Electronics, Information and Bioengineering | Politecnico di Milano (Italy) | andrea.castelletti@polimi.it Armando Di Nardo -Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | armando.dinardo@unicampania.it
Citation and reference:
If you use this database, please consider citing our paper
Di Mauro, A., Cominola, A., Castelletti, A., & Di Nardo, A. (2021). Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water, 13(1), 36, https://doi.org/10.3390/w13010036
Updates and Contributions:
The catalogue stored in this public repository can be collaboratively updated as more datasets become available. The authors will periodically update it to a new version.
New requests can be submitted to the authors, so that the dataset collection can be improved by different contributors. Contributors will be cited, step by step, in the updated versions of the dataset catalogue.
Updates history:
March 1st, 2021 - Pacheco, C.J.B., Horsburgh, J.S., Tracy, J.R. (Utah State University, Logan, UT - USA) --- The dataset associated with paper Bastidas Pacheco, C.J.; Horsburgh, J.S.; Tracy, R.J.. A Low-Cost, Open Source Monitoring System for Collecting High Temporal Resolution Water Use Data on Magnetically Driven Residential Water Meters. Sensors 2020, 20, 3655. is published in the HydroShare repository, where it is available as an OPEN dataset. Data can be found here: https://doi.org/10.4211/hs.4de42db6485f47b290bd9e17b017bb51
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The YJMob100K human mobility datasets (YJMob100K_dataset1.csv.gz and YJMob100K_dataset1.csv.gz) contain the movement of a total of 100,000 individuals across a 75 day period, discretized into 30-minute intervals and 500 meter grid cells. The first dataset contains the movement of 80,000 individuals across a 75-day business-as-usual period, while the second dataset contains the movement of 20,000 individuals across a 75-day period (including the last 15 days during an emergency) with unusual behavior.
While the name or location of the city is not disclosed, the participants are provided with points-of-interest (POIs; e.g., restaurants, parks) data for each grid cell (~85 dimensional vector) as supplementary information (cell_POIcat.csv.gz). The list of 85 POI categories can be found in POI_datacategories.csv.
For details of the dataset, see Data Descriptor:
Yabe, T., Tsubouchi, K., Shimizu, T., Sekimoto, Y., Sezaki, K., Moro, E., & Pentland, A. (2024). YJMob100K: City-scale and longitudinal dataset of anonymized human mobility trajectories. Scientific Data, 11(1), 397. https://www.nature.com/articles/s41597-024-03237-9
--- Details about the Human Mobility Prediction Challenge 2023 (ended November 13, 2023) ---
The challenge takes place in a mid-sized and highly populated metropolitan area, somewhere in Japan. The area is divided into 500 meters x 500 meters grid cells, resulting in a 200 x 200 grid cell space.
The human mobility datasets (task1_dataset.csv.gz and task2_dataset.csv.gz) contain the movement of a total of 100,000 individuals across a 90 day period, discretized into 30-minute intervals and 500 meter grid cells. The first dataset contains the movement of a 75 day business-as-usual period, while the second dataset contains the movement of a 75 day period during an emergency with unusual behavior.
There are 2 tasks in the Human Mobility Prediction Challenge.
In task 1, participants are provided with the full time series data (75 days) for 80,000 individuals, and partial (only 60 days) time series movement data for the remaining 20,000 individuals (task1_dataset.csv.gz). Given the provided data, Task 1 of the challenge is to predict the movement patterns of the individuals in the 20,000 individuals during days 60-74. Task 2 is similar task but uses a smaller dataset of 25,000 individuals in total, 2,500 of which have the locations during days 60-74 masked and need to be predicted (task2_dataset.csv.gz).
While the name or location of the city is not disclosed, the participants are provided with points-of-interest (POIs; e.g., restaurants, parks) data for each grid cell (~85 dimensional vector) as supplementary information (which is optional for use in the challenge) (cell_POIcat.csv.gz).
For more details, see https://connection.mit.edu/humob-challenge-2023
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Breast cancer is one of the most prevalent types of cancer and the leading type of cancer death. Mammography is the recommended imaging modality for periodic breast cancer screening. A few datasets have been published to develop computer-aided tools for mammography analysis. However, these datasets either have a limited sample size or consist of screen-film mammography (SFM), which have been replaced by full-field digital mammography (FFDM) in clinical practices. This project introduces a large-scale full-field digital mammography dataset of 5,000 four-view exams, which are double read by experienced mammographers to provide cancer assessment and breast density following the Breast Imaging Report and Data System (BI-RADS). Breast abnormalities that require further examination are also marked by bounding rectangles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains the Reference Model 5 (RM5) full scale geometry files of the Oscillating Surge Flap, developed by the Reference Model Project (RMP). These full scale geometry files are saved as SolidWorks assembly, IGS, and STEP files, and require a CAD program to view. This data was generated upon completion of the project on September 30, 2014.
The Reference Model Project (RMP), sponsored by the U.S. Department of Energy (DOE), was a partnered effort to develop open-source MHK point designs as reference models (RMs) to benchmark MHK technology performance and costs, and an open-source methodology for design and analysis of MHK technologies, including models for estimating their capital costs, operational costs, and levelized costs of energy. The point designs also served as open-source test articles for university researchers and commercial technology developers. The RMP project team, led by Sandia National Laboratories (SNL), included a partnership between DOE, three national laboratories, including the National Renewable Energy Laboratory (NREL), Pacific Northwest National Laboratory (PNNL), and Oak Ridge National Laboratory (ORNL), the Applied Research Laboratory of Penn State University, and Re Vision Consulting.
Reference Model 5 (RM5) is a type of floating, oscillating surge wave energy converter (OSWEC) that utilizes the surge motion of waves to generate electrical power. The reference wave energy resource for RM5 was measurement data from a National Data Buoy Center (NDBC) buoy near Eureka, in Humboldt County, California. The flap was designed to rotate against the supporting frame to convert wave energy into electrical power from the relative rotational motion induced by incoming waves. The RM5 design is rated at 360 kilowatts (kW), uses a flap of 25 m in width and 19 m in height (16 m in draft), and the distance from the top of the water surface piercing flap to the mean water surface (freeboard) is 1.5 m. The flap is connected to a shaft with a 3-m diameter that rotates against the supporting frame. The supporting frame is assumed to have an outer diameter of 2 m, and the total length of the device structure is 45 m. The RM5 OSWEC was designed for deep-water deployment, at depths between 50 m and 100 m, and was tension-moored to the seabed.
Note: Many of these files also appear as supplementary files on the journal website. This provides an opportunity to provide all files associated with the paper in one place, alongside expanded descriptions of all files so that they are easier to navigate.
SI Text
Supplementary methods, results, and discussion.
SI Figures S1-S15
All 15 SI figures with captions.
Fig. S1: Size distributions (log10 scale) for taxa in each habitat use across four datasets: (a) ‘FB11k dataset’; (b) ‘CoF11k dataset’; (c) ‘FB31k dataset’; (d) ‘CoF31k dataset’.
Fig. S2: Corresponding plot to main text Fig. 1 using FishBase 31k tree dataset.
Fig. S3: The percentage of groups where the phylogenetic mean size of taxa for one habitat use is larger than the other, obtained for every pairwise habitat-use comparison within all four datasets (FB11k, CoF11k, FB31k and CoF31k tree datasets).
Fig. S4: The per...
This dataset contains the discrete carbon data collected during the 2016 West Coast Ocean Acidification (WCOA) cruise. WCOA2016 took place May 5 to June 7, 2016 aboard NOAA Ship Ronald H. Brown. It is the most integrated WCOA cruise so far, with 132 stations occupied from Baja California in Mexico to Vancouver Island in Canada along seventeen transect lines. At all stations, CTD casts were conducted, and discrete water samples were collected in Niskin bottles. The cruise was designed to obtain a synoptic snapshot of key carbon, physical, and biogeochemical parameters as they relate to ocean acidification (OA) in the coastal realm. Physical, biogeochemical, and chlorophyll concentration data collected during CTD casts are included with this data set. During the cruise, some of the same transect lines were occupied as during the 2007, 2011, 2012, and 2013 West Coast Ocean Acidification cruises, as well as CalCOFI cruises. This effort was conducted in support of the coastal monitoring and research objectives of the NOAA Ocean Acidification Program (OAP). Data Use Policy: Data from NOAA West Coast Ocean Acidification (WCOA) cruises are made freely available to the public and the scientific community in the belief that their wide dissemination will lead to greater understanding and new scientific and policy insights. The investigators sharing these data rely on the ethics and integrity of the user to ensure that the institutions and investigators involved in producing the WCOA cruise datasets receive fair credit for their work. If the data are obtained for potential use in a publication or presentation, we urge the end user to inform the investigators listed herein at the outset of the nature of this work. If these data are essential to the work, or if an important result or conclusion depends on these data, co-authorship may be appropriate. This should be discussed at an early stage in the work. We request that any manuscripts using these data be sent to all investigators listed in the metadata before they are submitted for publication so that we can ensure that the quality and limitations of the data are accurately represented. Please direct all queries about this dataset to Simone Alin and Richard Feely.
Flux1.1 Likert Scale Text-to-Image Alignment Evaluation
This dataset contains images generated using Flux1.1 [pro] based on the prompts from our text-to-image generation benchmark. Where the benchmark generally focuses on pairwise comparisons to rank different image generation models against each other, this Likert-scale dataset focuses on one particular model and aims to reveal the particular nuances and highlight strong and weaks points of the model. If you get value from this… See the full description on the dataset page: https://huggingface.co/datasets/Rapidata/flux1.1-likert-scale-preference.
The National Hydrography Dataset Plus (NHDplus) maps the lakes, ponds, streams, rivers and other surface waters of the United States. Created by the US EPA Office of Water and the US Geological Survey, the NHDPlus provides mean annual and monthly flow estimates for rivers and streams. Additional attributes provide connections between features facilitating complicated analyses. For more information on the NHDPlus dataset see the NHDPlus v2 User Guide.Dataset SummaryPhenomenon Mapped: Surface waters and related features of the United States and associated territories not including Alaska.Geographic Extent: The United States not including Alaska, Puerto Rico, Guam, US Virgin Islands, Marshall Islands, Northern Marianas Islands, Palau, Federated States of Micronesia, and American SamoaProjection: Web Mercator Auxiliary Sphere Visible Scale: Visible at all scales but layer draws best at scales larger than 1:1,000,000Source: EPA and USGSUpdate Frequency: There is new new data since this 2019 version, so no updates planned in the futurePublication Date: March 13, 2019Prior to publication, the NHDPlus network and non-network flowline feature classes were combined into a single flowline layer. Similarly, the NHDPlus Area and Waterbody feature classes were merged under a single schema.Attribute fields were added to the flowline and waterbody layers to simplify symbology and enhance the layer's pop-ups. Fields added include Pop-up Title, Pop-up Subtitle, On or Off Network (flowlines only), Esri Symbology (waterbodies only), and Feature Code Description. All other attributes are from the original NHDPlus dataset. No data values -9999 and -9998 were converted to Null values for many of the flowline fields.What can you do with this layer?Feature layers work throughout the ArcGIS system. Generally your work flow with feature layers will begin in ArcGIS Online or ArcGIS Pro. Below are just a few of the things you can do with a feature service in Online and Pro.ArcGIS OnlineAdd this layer to a map in the map viewer. The layer is limited to scales of approximately 1:1,000,000 or larger but a vector tile layer created from the same data can be used at smaller scales to produce a webmap that displays across the full range of scales. The layer or a map containing it can be used in an application. Change the layer’s transparency and set its visibility rangeOpen the layer’s attribute table and make selections. Selections made in the map or table are reflected in the other. Center on selection allows you to zoom to features selected in the map or table and show selected records allows you to view the selected records in the table.Apply filters. For example you can set a filter to show larger streams and rivers using the mean annual flow attribute or the stream order attribute. Change the layer’s style and symbologyAdd labels and set their propertiesCustomize the pop-upUse as an input to the ArcGIS Online analysis tools. This layer works well as a reference layer with the trace downstream and watershed tools. The buffer tool can be used to draw protective boundaries around streams and the extract data tool can be used to create copies of portions of the data.ArcGIS ProAdd this layer to a 2d or 3d map. Use as an input to geoprocessing. For example, copy features allows you to select then export portions of the data to a new feature class. Change the symbology and the attribute field used to symbolize the dataOpen table and make interactive selections with the mapModify the pop-upsApply Definition Queries to create sub-sets of the layerThis layer is part of the ArcGIS Living Atlas of the World that provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Scales Mound population by age cohorts (Children: Under 18 years; Working population: 18-64 years; Senior population: 65 years or more). It lists the population in each age cohort group along with its percentage relative to the total population of Scales Mound. The dataset can be utilized to understand the population distribution across children, working population and senior population for dependency ratio, housing requirements, ageing, migration patterns etc.
Key observations
The largest age group was 18 to 64 years with a poulation of 212 (48.07% of the total population). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age cohorts:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Scales Mound Population by Age. You can refer the same here
This dataset contains dissolved inorganic carbon, Total alkalinity, pH on Total Scale, nutrients and other variables measured from profile discrete measurement in the Northeast coast of the US. Increasing amounts of atmospheric carbon dioxide from human industrial activities are causing changes in global ocean carbon chemistry resulting in a reduction in pH, a process termed ocean acidification. Studies have demonstrated adverse effects on calcifying organisms, particularly some invertebrates, corals, sea urchins, pteropods, and coccolithophores. This effort is in support of the coastal monitoring and research objectives of the NOAA Ocean Acidification Program (OAP).
The active fault data displayed here are from a variety of sources. It includes the New Zealand Active Faults Database (NZAFD) which comes in two versions - 1:250,000 scale (NZAFD-AF250) and a high-resolution scale (NZAFD-HighRes) – and is prepared by the Institute of Geological and Nuclear Sciences Limited (GNS Science). The active fault datasets also include Fault Avoidance Zones (FAZs) and Fault Awareness Areas (FAAs). The NZAFD-AF250 database covers New Zealand mainland, while the NZAFD-HighRes database, FAZs and FAAs are only available for restricted areas of New Zealand (updated periodically and without prior notification). If the FAZs are used to assist future land use planning, this should be done in accordance with the Ministry for the Environment "Planning for Development on or Close to Active Faults" (Kerr et al. 2003). The FAAs show where there may be a surface fault rupture hazard, but further work is needed to define a FAZ, and it is recommended that this dataset is used in conjunction with the guidelines developed by Barrell et al. (2015).The NZAFD is produced by GNS Science and represents the most current mapping of active faults for New Zealand in a single database. The NZAFD can be accessed on the GNS webmap via the link below.The NZAFD contains two distinct datasets based on scale:The high-resolution (NZAFD-HighRes) dataset (1:10,000 scale or better), designed for portrayal and use at cadastral (property) scale. This is currently only available to be viewed on the GNS webmap for some regions.The generalised (NZAFD-AF250) dataset, designed for portrayal and use at regional scale (1:250,000 scale). This can be viewed and downloaded on the GNS webmap for the entire country.Both datasets comprise polylines that represent the location of an active fault trace at or near the surface, at different scales. Each fault trace has attributes that describe its name, sense of movement, displacement, recurrence interval and other parameters.The high-resolution dataset group on the GNS webmap also includes two polygon layers derived from the NZAFD:Fault Avoidance Zones, which delineate areas of surface rupture hazard, as defined by the Ministry for the Environment Active Fault Guidelines (Kerr et al. 2003(external link)), or modifications thereof.Fault Awareness Areas, which highlight areas where a surface rupture hazard may exist (Barrell et al. 2015(external link)) and where more work is needed.
Attached are the .cas and .dat files for the Reynolds Averaged Navier-Stokes (RANS) simulation of a single full scale DOE RM1 turbine implemented in ANSYS FLUENT CFD-package. In this case study taking advantage of the symmetry of the DOE RM1 geometry, only half of the geometry is modeled using (Single) Rotating Reference Frame model [RRF]. In this model RANS equations, coupled with k-\omega turbulence closure model, are solved in the rotating reference frame. The actual geometry of the turbine blade is included and the turbulent boundary layer along the blade span is simulated using wall-function approach. The rotation of the blade is modeled by applying periodic boundary condition to sets of plane of symmetry. This case study simulates the performance and flow field in both the near and far wake of the device at the desired operating conditions. The results of these simulations showed good agreement to the only publicly available numerical simulation of the device done in the NREL. Please see the attached paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: SPI: Pillar 1 Data Use Score: Scale 0-100 data was reported at 100.000 NA in 2019. This stayed constant from the previous number of 100.000 NA for 2018. United States US: SPI: Pillar 1 Data Use Score: Scale 0-100 data is updated yearly, averaging 60.000 NA from Dec 2004 (Median) to 2019, with 16 observations. The data reached an all-time high of 100.000 NA in 2019 and a record low of 40.000 NA in 2009. United States US: SPI: Pillar 1 Data Use Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Governance: Policy and Institutions. The data use overall score is a composite score measuring the demand side of the statistical system. The data use pillar is segmented by five types of users: (i) the legislature, (ii) the executive branch, (iii) civil society (including sub-national actors), (iv) academia and (v) international bodies. Each dimension would have associated indicators to measure performance. A mature system would score well across all dimensions whereas a less mature one would have weaker scores along certain dimensions. The gaps would give insights into prioritization among user groups and help answer questions as to why the existing services are not resulting in higher use of national statistics in a particular segment. Currently, the SPI only features indicators for one of the five dimensions of data use, which is data use by international organizations. Indicators on whether statistical systems are providing useful data to their national governments (legislature and executive branches), to civil society, and to academia are absent. Thus the dashboard does not yet assess if national statistical systems are meeting the data needs of a large swathe of users.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
The electric grid is a key enabling infrastructure for the ambitious transition towards carbon neutrality as we grapple with climate change. With deepening penetration of renewable energy resources and electrified transportation, the reliable and secure operation of the electric grid becomes increasingly challenging. In this paper, we present PSML, a first-of-its-kind open-access multi-scale time-series dataset, to aid in the development of data-driven machine learning (ML) based approaches towards reliable operation of future electric grids. The dataset is generated through a novel transmission + distribution (T+D) co-simulation designed to capture the increasingly important interactions and uncertainties of the grid dynamics, containing electric load, renewable generation, weather, voltage and current measurements at multiple spatio-temporal scales. Using PSML, we provide state-of-the-art ML baselines on three challenging use cases of critical importance to achieve: (i) early detection, accurate classification and localization of dynamic disturbance events; (ii) robust hierarchical forecasting of load and renewable energy with the presence of uncertainties and extreme events; and (iii) realistic synthetic generation of physical-law-constrained measurement time series. We envision that this dataset will enable advances for ML in dynamic systems, while simultaneously allowing ML researchers to contribute towards carbon-neutral electricity and mobility.
Data Navigation
Please download, unzip and put somewhere for later benchmark results reproduction and data loading and performance evaluation for proposed methods.
wget https://zenodo.org/record/5130612/files/PSML.zip?download=1
7z x 'PSML.zip?download=1' -o./
Minute-level Load and Renewable
Minute-level PMU Measurements
Millisecond-level PMU Measurements
This dataset contains disinfection efficacy of scrubs, face coverings, and denim contaminated by Phi6 and MS2 and cleaned using hot water laundering.
This dataset is associated with the following publication: Mikelonis, A., J. Archer, B. Wyrzykowska, E. Morris, J. Sawyer, T. Chamberlain, A. Abdel-Hady, M. Monge, and A. Touati. Determining Viral Disinfection Efficacy of Hot Water Laundering. Journal of Visualized Experiments. JoVE, Somerville, MA, USA, 184: e64164, (2022).