24 datasets found

Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
The Canada Trademarks Dataset
zenodo.org
pdf, zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Sheff; Jeremy Sheff (2024). The Canada Trademarks Dataset [Dataset]. http://doi.org/10.5281/zenodo.4999655
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4999655
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jeremy Sheff; Jeremy Sheff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
The Canada Trademarks Dataset

18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303

Dataset Selection and Arrangement (c) 2021 Jeremy Sheff

Python and Stata Scripts (c) 2021 Jeremy Sheff

Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.

This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.

Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.

Terms of Use:

As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.

The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:

The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.

Details of Repository Contents:

This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:

/csv: contains the .csv versions of the data files

/do: contains Stata do-files used to convert the .csv files to .dta format and perform the statistical analyses set forth in the paper reporting this dataset

/dta: contains the .dta versions of the data files

/py: contains the python scripts used to download CIPO’s historical trademarks data via SFTP and generate the .csv data files

If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.

The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.

With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.

The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.

This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
Z
NII Face Mask Dataset
data.niaid.nih.gov
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khanh-Duy Nguyen (2022). NII Face Mask Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5761724
Explore at:
Dataset updated
Jan 26, 2022
Dataset provided by
Junichi Yamagishi
Khanh-Duy Nguyen
Trung-Nghia Le
Isao Echizen
Huy H. Nguyen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
=====================================================================

NII Face Mask Dataset v1.0

=====================================================================

Authors: Trung-Nghia Le (1), Khanh-Duy Nguyen (2), Huy H. Nguyen (1), Junichi Yamagishi (1), Isao Echizen (1)

Affiliations: (1)National Institute of Informatics, Japan (2)University of Information Technology-VNUHCM, Vietnam

National Institute of Informatics Copyright (c) 2021

Emails: {ltnghia, nhhuy, jyamagis, iechizen}@nii.ac.jp, {khanhd}@uit.edu.vn

Arxiv: https://arxiv.org/abs/2111.12888 NII Face Mask Dataset v1.0: https://zenodo.org/record/5761725

=============================== INTRODUCTION ===============================

The NII Face Mask Dataset is the first large-scale dataset targeting mask-wearing ratio estimation in street cameras. This dataset contains 581,108 face annotations extracted from 18,088 video frames (1920x1080 pixels) in 17 street-view videos obtained from the Rambalac's YouTube channel.

https://www.youtube.com/c/Rambalac

The videos were taken in multiple places, at various times, before and during the COVID-19 pandemic. The total length of the videos is approximately 56 hours.

=============================== REFERENCES ===============================

If your publish using any of the data in this dataset please cite the following papers:

Pre-print version

@article{Nguyen202112888, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, author={Nguyen, Khanh-Duy and Nguyen, Huy H and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, archivePrefix={arXiv}, arxivId={2111.12888}, url={https://arxiv.org/abs/2111.12888}, year={2021} }

Final version

@INPROCEEDINGS{Nguyen2021EstMaskWearing, author={Nguyen, Khanh-Duv and Nguyen, Huv H. and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)}, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, year={2021}, pages={1-8}, url={https://ieeexplore.ieee.org/document/9667046}, doi={10.1109/FG52635.2021.9667046}}

======================== DATA STRUCTURE ==================================

1. Directory Structure

./NFM ├── dataset │ ├── train.csv: annotations for the train set. │ ├── test.csv: annotations for the test set. └── README_v1.0.md

2. Description for each files in detail.

We use the same structure for two CSV files (train.csv and test.csv). Both CSV files have the same columns: <1st column>: video_id (a source video can be found by following the link: https://www.youtube.com/watch?v=) <2nd column>: frame_id (the index of a frame extracted from the source video) <3rd column>: timestamp in milisecond (the timestamp of a frame extracted from the source video) <4th column>: label (for each annotated face, one of three labels was attached with a bounding box: 'Mask'/'No-Mask'/'Unknown') <5th column>: left <6th column>: top <7th column>: right <8th column>: bottom Four coordinates (left, top, right, bottom) were used to denote a face's bounding box.

============================== COPYING ================================

This repository is made available under Creative Commons Attribution License (CC-BY).

Regarding Creative Commons License: Attribution 4.0 International (CC BY 4.0), please see https://creativecommons.org/licenses/by/4.0/

THIS DATABASE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATABASE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE

====================== ACKNOWLEDGEMENTS ================================

This research was partly supported by JSPS KAKENHI Grants (JP16H06302, JP18H04120, JP21H04907, JP20K23355, JP21K18023), and JST CREST Grants (JPMJCR20D3, JPMJCR18A6), Japan.

This dataset is based on the Rambalac's YouTube channel: https://www.youtube.com/c/Rambalac
d
Metrics to Characterizing Periods of Anomalously Low Water Storage for...
catalog.data.gov
data.usgs.gov
Updated Jul 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Metrics to Characterizing Periods of Anomalously Low Water Storage for Selected Reservoirs in the Conterminous U.S. from 1981 to 2020 [Dataset]. https://catalog.data.gov/dataset/metrics-to-characterizing-periods-of-anomalously-low-water-storage-for-selected-reservoirs
Explore at:
Dataset updated
Jul 20, 2024
Dataset provided by
U.S. Geological Survey
Area covered
United States
Description
This metadata record describes a series of tabular datasets containing metrics used to characterize periods of anomalously low storage for select large reservoirs across the Conterminous United States for the climate years (April 1 – March 31) from 1981 to 2020. These data support the associated Simeone and others (2024) publication. The reservoirs in this dataset are those included in the ResOpsUS dataset, with sufficient data during the period of interest. These metrics include reservoir storage percentiles, identified low-storage anomaly events, annual low storages, low-storage anomaly statistics for each event, and trends in reservoir metrics. This data release contains the following files. One version of the following three files for variable threshold method low-storage anomaly (weibull_jd), fixed threshold method low-storage anomaly (weibull_site), and operating curve-based low-storage anomalies (operating curve). Substitute each of these strings in parentheses for the strings here between . 1) percentiles_1981_2020.zip: Percentile zip files for the period of 1981-2020. This zip file contains the reservoir storage, inflow, and outflow percentiles as individual csv files for each reservoir (for example, res_ops_XXX.csv, where XXX is the ResOpsUS reservoir identification number) including percentiles from each different method. The metadata contains column details for an example version of these files. 2) reservoir_1981_2020_weibull_jddrought_properties.csv: This is a csv file that contains summaries of each low-storage anomaly event for each reservoir for each threshold. One of these files exists for each of our three low-storage anomaly methods. The three files included have different labels for each of the three different methods of identifying low-storage anomalies. 3) reservoir_1981_2020weibull_jdcomplete_annual_stats.csv: This csv is a csv file that contains the annual low-storage anomaly statistics for each climate year, threshold, and reservoir. The three files included have different labels for each of the three different methods of identifying low-storage anomalies. 4) reservoir_1981_2020weibull_jd*_trends.csv: This csv contains data on trends in low-storage anomaly characteristics for selected reservoirs during our primary period of interest from 1981 to 2020. 5) reservoir_metadata.csv: This csv file contains metadata for each reservoir.
Z
Large Ensemble Dataset for Discovering Global Peak Water Limit of Future...
data.niaid.nih.gov
zenodo.org
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kim, Son (2024). Large Ensemble Dataset for Discovering Global Peak Water Limit of Future Groundwater Withdrawals Using 900 GCAM Runs [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6480464
Explore at:
Dataset updated
Apr 26, 2024
Dataset provided by
Lamontagne, Jonathan
Msangi, Siwa
Kim, Son
Turner, Sean
Zhao, Mengqi
Wild, Thomas
Graham, Neal
Niazi, Hassan
Hejazi, Mohamad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Global Groundwater Withdrawals Peak Over the 21st Century

The large ensemble dataset contains groundwater related model outputs from 900 scenarios modeled using Global Change Analysis Model (GCAM). The scenario ensemble members include five Shared Socioeconomic Pathways (SSPs), four Representative Concentration Pathways (RCPs), five global climate model outputs, three groundwater depletion limits, two surface water storage expansion regimes, and two historical groundwater depletion trends.

Journal Article

Niazi, H., Wild, T.B., Turner, S.W.D., Graham, N.T., Hejazi, M., Msangi, S., Kim, S., Lamontagne, J.R., & Zhao, M. (2024). Global peak water limit of future groundwater withdrawals. Nature Sustainability, 7(4), 413–422. https://doi.org/10.1038/s41893-024-01306-w

Read full-text here: https://rdcu.be/dFpb5

Data Repository

This data repository is to be used in combination with the main meta-repository containing all scripts and files for reproducing the experiment as well as the analysis and post-processing of the model outputs.

Scripts and smaller files are provided in the GitHub meta-repository whereas larger files are provided in this data repository. Please complete the repository by placing the files as described hereunder. Please find the GitHub meta-repository here: https://github.com/JGCRI/niazi-etal_2024_nature-sustainability

Descriptions of files:

gcam-5.7z contains the GCAM version used to simulate 900 scenarios of plausible futures. The model folder contains all necessary input files to reproduce the simulations.

The model is to be used in combination with the meta-repository to setup batch runs on cluster.

Please navigate to model/ folder for other scenario-specific and model setup folders and files. gcam-5 is to be extracted in the same directory (./model/gcam-5/).

For the first-time users of GCAM, please follow guidance on GCAM wiki to setup GCAM or for background knowledge.

crop_yeild.7z: This file contains inputs related to climate impacts on crop yields. This is to be downloaded and extracted in the model/combined_impacts/ folder.

outputs-all.7z: Key model outputs queried and collated from 900 GCAM runs are explained hereunder. The files could be downloaded individually (.csv files) or all at once in .7z format (outputs-all.7z). These files are to be placed in the model/outputs folder of the meta-repository.

ag_prod_all_GW_scenarios.csv - Agricultural production across all scenario for 2050 and 2100 (tonnes)

prices_water_withdrawal_all.csv - Water prices across all scenarios and years ($/km3)

global_irrigated_prod_by_crop.csv - All irrigated agricultural production for each crop across and scenarios all years (tonnes)

surface_water_production_all.csv - Runoff across all scenarios and years (km3)

groundwater_production_FINAL.csv - Groundwater withdrawals across all scenarios and years (km3)

water_withdrawals_desal_all.csv - Water withdrawals from desalination plants across all scenarios and years (km3)

Short introduction to the study

Using 900 GCAM runs, this study finds that global groundwater withdrawals are expected to peak around mid-century, followed by a decline through 21st century, exposing about half of the population living in one-third of basins to groundwater stress, with cost and availability of surface water storage being the most significant driver of future groundwater withdrawals. This first-ever robust, quantitative confirmation of the peak-and-decline pattern for groundwater, previously only known for fossil fuels and minerals, raises concerns for basins heavily dependent on groundwater.

Niazi, H., Wild, T.B., Turner, S.W.D., Graham, N.T., Hejazi, M., Msangi, S., Kim, S., Lamontagne, J.R., & Zhao, M. (2024). Global peak water limit of future groundwater withdrawals. Nature Sustainability, 7(4), 413–422. https://doi.org/10.1038/s41893-024-01306-w

Read full-text here: https://rdcu.be/dFpb5

Contact

Please reach out to Hassan Niazi at hassan.niazi@pnnl.gov for any questions.
Supplementary Data for Large-scale Spatially Explicit Analysis of Carbon...
springernature.figshare.com
csv
Updated Aug 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric O'Neill; Caleb Geissler; Christos T. Maravelias (2024). Supplementary Data for Large-scale Spatially Explicit Analysis of Carbon Capture at Cellulosic Biorefineries [Dataset]. http://doi.org/10.6084/m9.figshare.24596394.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24596394.v1
Dataset updated
Aug 29, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Eric O'Neill; Caleb Geissler; Christos T. Maravelias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DATA & FILE OVERVIEW

File List:

/BiomassData.csv: The yield, soil carbon sequestration, field area, and locations for fields identified as historically abandoned marginal land and with crop growth simulated using the SALUS biogeochemical crop model. /ParametersWithSources.xlsx: Economic, environmental, efficiency parameters used to parametrize the MILP model used in the study, source data for Supplementary Table 1. /RefineryData.csv: The potential locations, and CO2 transportation cost for biorefineries /DepotData.csv: The potential locations for preprocessing depots /figdata_Fig3a.csv: The results used to generate figure 3 /figdata_Fig3b_1.csv: The results used to generate figure 3 /figdata_Fig3b_2.csv: The results used to generate figure 3 /figdata_Fig4.csv: The results used to generate figure 4 /figdata_Fig5.csv: The technology matrix used to generate figure 5 /figdata_Fig6.csv: The results used to generate figure 6 /*_map.pdf: map files for supplementary figures /*_geopdf.pdf map files with georeferenced information /SupplementaryFigureData.xlsx: tabular data for supplementary plots. /fields.gpkg The shape file used to plot map files
Crowd Counting Dataset
kaggle.com
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2024). Crowd Counting Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/crowd-counting-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Crowd Counting Dataset

The dataset includes images featuring crowds of people ranging from 0 to 5000 individuals. The dataset includes a diverse range of scenes and scenarios, capturing crowds in various settings. Each image in the dataset is accompanied by a corresponding JSON file containing detailed labeling information for each person in the crowd for crowd count and classification.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4b51a212e59f575bd6978f215a32aca0%2FFrame%2064.png?generation=1701336719197861&alt=media" alt="">

Types of crowds in the dataset: 0-1000, 1000-2000, 2000-3000, 3000-4000 and 4000-5000

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F72e0fed3ad13826d6545ff75a79ed9db%2FFrame%2065.png?generation=1701337622225724&alt=media" alt="">

This dataset provides a valuable resource for researchers and developers working on crowd counting technology, enabling them to train and evaluate their algorithms with a wide range of crowd sizes and scenarios. It can also be used for benchmarking and comparison of different crowd counting algorithms, as well as for real-world applications such as public safety and security, urban planning, and retail analytics.

Full version of the dataset includes 647 labeled images of crowds, leave a request on TrainingData to buy the dataset

Statistics for the dataset (number of images by the crowd's size and image width):

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F2e9f36820e62a2ef62586fc8e84387e2%2FFrame%2063.png?generation=1701336725293625&alt=media" alt="">

OTHER BIOMETRIC DATASETS:

Anti Spoofing Real Dataset

Antispoofing Replay Dataset

Selfies, ID Images dataset (5591 sets of 15 files)

Selfies and video dataset (4 052 sets)

Dataset of bald people, 5000 images

Get the Dataset

This is just an example of the data

Leave a request on https://trainingdata.pro/datasets to learn about the price and buy the dataset

Content

images - includes original images of crowds placed in subfolders according to its size,

labels - includes json-files with labeling and visualised labeling for the images in the previous folder,

csv file - includes information for each image in the dataset

File with the extension .csv

id: id of the image,

image: link to access the original image,

label: link to access the json-file with labeling,

type: type of the crowd on the photo

TrainingData provides high-quality data annotation tailored to your needs

keywords: crowd counting, crowd density estimation, people counting, crowd analysis, image annotation, computer vision, deep learning, object detection, object counting, image classification, dense regression, crowd behavior analysis, crowd tracking, head detection, crowd segmentation, crowd motion analysis, image processing, machine learning, artificial intelligence, ai, human detection, crowd sensing, image dataset, public safety, crowd management, urban planning, event planning, traffic management
f
Table_7_Time Is of the Essence—Early Activation of the Mevalonate Pathway in...
frontiersin.figshare.com
txt
Updated Jun 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthias Naets; Wendy Van Hemelrijck; Willem Gruyters; Pieter Verboven; Bart Nicolaï; Wannes Keulemans; Barbara De Coninck; Annemie H. Geeraerd (2023). Table_7_Time Is of the Essence—Early Activation of the Mevalonate Pathway in Apple Challenged With Gray Mold Correlates With Reduced Susceptibility During Postharvest Storage.CSV [Dataset]. http://doi.org/10.3389/fmicb.2022.797234.s022
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fmicb.2022.797234.s022
Dataset updated
Jun 14, 2023
Dataset provided by
Frontiers
Authors
Matthias Naets; Wendy Van Hemelrijck; Willem Gruyters; Pieter Verboven; Bart Nicolaï; Wannes Keulemans; Barbara De Coninck; Annemie H. Geeraerd
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Apple is typically stored under low temperature and controlled atmospheric conditions to ensure a year round supply of high quality fruit for the consumer. During storage, losses in quality and quantity occur due to spoilage by postharvest pathogens. One important postharvest pathogen of apple is Botrytis cinerea. The fungus is a broad host necrotroph with a large arsenal of infection strategies able to infect over 1,400 different plant species. We studied the apple-B. cinerea interaction to get a better understanding of the defense response in apple. We conducted an RNAseq experiment in which the transcriptome of inoculated and non-inoculated (control and mock) apples was analyzed at 0, 1, 12, and 28 h post inoculation. Our results show extensive reprogramming of the apple’s transcriptome with about 28.9% of expressed genes exhibiting significant differential regulation in the inoculated samples. We demonstrate the transcriptional activation of pathogen-triggered immunity and a reprogramming of the fruit’s metabolism. We demonstrate a clear transcriptional activation of secondary metabolism and a correlation between the early transcriptional activation of the mevalonate pathway and reduced susceptibility, expressed as a reduction in resulting lesion diameters. This pathway produces the building blocks for terpenoids, a large class of compounds with diverging functions including defense. 1-MCP and hot water dip treatment are used to further evidence the key role of terpenoids in the defense and demonstrate that ethylene modulates this response.
Data for reproducing "msqrob2TMT: robust linear mixed models for inferring...
zenodo.org
csv, txt, zip
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christophe Vanderaa; Christophe Vanderaa; Stijn Vandenbulcke; Lieven Clement; Lieven Clement; Stijn Vandenbulcke (2025). Data for reproducing "msqrob2TMT: robust linear mixed models for inferring differential abundant proteins in labelled experiments with arbitrarily complex design" [Dataset]. http://doi.org/10.5281/zenodo.14767905
Explore at:
txt, csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14767905
Dataset updated
Jul 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christophe Vanderaa; Christophe Vanderaa; Stijn Vandenbulcke; Lieven Clement; Lieven Clement; Stijn Vandenbulcke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Labelling strategies in mass spectrometry (MS)-based proteomics enhance sample throughput by enabling the acquisition of multiplexed samples within a single run. However, contemporary experiments often involve increasingly complex designs, where the number of samples exceeds the capacity of a single run, resulting in a complex correlation structure that must be addressed for accurate statistical inference and reliable biomarker discovery. To this end, we introduce msqrob2TMT, a suite of mixed model-based workflows specifically designed for differential abundance analysis in labelled MS-based proteomics data. msqrob2TMT accommodates both sample-specific and feature-specific (e.g., peptide or protein) covariates, facilitating inference in experiments with arbitrarily complex designs and allowing for explicit correction of feature-specific covariates. We benchmark our innovative workflows against state-of-the-art tools, including DEqMS, MSstatsTMT, and msTrawler, using two spike-in studies. Our findings demonstrate that msqrob2TMT offers greater flexibility, improved modularity, and enhanced performance, particularly through the application of robust ridge regression. Finally, we demonstrate the practical relevance of msqrob2TMT in a real mouse study, highlighting its capacity to effectively account for the complex correlation structure in the data.

Vandenbulcke S, Vanderaa C, Crook O, Martens L, Clement L. Msqrob2TMT: Robust linear mixed models for inferring differential abundant proteins in labeled experiments with arbitrarily complex design. Mol Cell Proteomics. 2025;24(7):101002.

Also available as a preprint

Vandenbulcke, S., Vanderaa, C ., Crook, O., Martens, L. & Clement, L. msqrob2TMT: robust linear mixed models for inferring differential abundant proteins in labelled experiments with arbitrarily complex design. bioRxiv 2024.03.29.587218 (2024) doi:10.1101/2024.03.29.587218

This repository provides the data required to reproduce the results shown in the msqrob2TMT study. Data are organised in two main parts: input data and processed data.

Input data

The input data consist of data generated by others that we used for our analyses. Files are organised using there prefixes, one for each data set.

spikein1

This data set has been published by Huang et al. 2020 and has been downloaded from the MassIVE repository (RMSV000000265). It contains 2 files:

spikein1_psms.txt: a table with identified and quantified peptide-to-spectrum matches (FTP link: ftp://massive.ucsd.edu/x01/RMSV000000265/2020-06-08_huang704_4336d436/quant/161117_SILAC_HeLa_UPS1_TMT10_5Mixtures_3TechRep_UPSdB_Multiconsensus_PD22_Intensity_03_with_FDR_control_PSMs.txt)

spikein1_annotations.csv: the associated sample annotations (FTP link: ftp://massive.ucsd.edu/v02/MSV000084264/metadata/SpikeIn5mix_PD_annotation.csv)

spikein2

This data set has been published by O'Brien et al. 2024 and has been downloaded from a private Google Cloud Storage. It contains 3 files:

spikein2_psms.csv: a table with identified and quantified peptide-to-spectrum matches (link)

spikein2_annotations.csv: a table with the associated sample annotations (link).

spilein2_covariateFile.csv: a file required to run the msTrawler method (link).

mouse

The data for the mouse study has been published by Plubell et al. 2017 and has been downloaded from the MassIVE RMSV000000264.7 reanalysis repository:

mouse_psms.txt : a table with identified and quantified peptide-to-spectrum matches (FTP link: ftp://massive.ucsd.edu/x01/RMSV000000264/2020-06-07_huang704_518429df/metadata/mouse3mix_PD_annotation.csv)

mouse_annotations.csv: the associated sample annotations (FTP link: ftp://massive.ucsd.edu/x01/RMSV000000264/2020-06-07_huang704_518429df/181017_Plubell_mouse_sh_lo_LF_HF_diet_adipocytes_3TMT10_HpH_Fusion_PD22_multi_01_PSMs.txt)

Processed data

We generated these data during our analyses and are provided in the processed.zip file. Each file is prefixed with the name of the data set it is related to. Here is a comprehensive list:

mouse_model_MsstatsTMT.rds: a data.frame containing the MSstatsTMT statistical inference results for the mouse dataset.

mouse_model_msqrob2tmt.rds: a data.frame containing the msqrob2TMT statistical inference results for the mouse dataset where proteins were summarised within fraction.

mouse_model_msqrob2tmt_mixture.rds: a data.frame containing the msqrob2TMT statistical inference results for the mouse dataset where proteins were summarised within mixture.

spikein1_input_deqms.rds: a data.frame containing the spikein1 data after PSM filtering, ready for analysis by DEqMS.

spikein1_input_msTrawler.txt: a tabular text file containing the spikein1 data after PSM filtering, ready for analysis by msTrawler.

spikein1_input_msqrob2tmt.rds: a QFeatures object containing the spikein1 dataafter PSM filtering, ready for analysis by msqrob2.

spikein1_input_msstatstmt.rds: a data.frame containing the spikein1 data after PSM filtering, ready for analysis by MSstatsTMT.

spikein1_model_DEqMS.rds: a data.frame containing the DEqMS statistical inference results for the spikin1 dataset.

spikein1_model_MsstatsTMT.rds: a data.frame containing the MSstatsTMT statistical inference results for the spikin1 dataset.

spikein1_model_compare_preprocessing.rds: a data.frame containing MSstatsTMT and msqrob2TMT statistical inference results for the spikin1 dataset upon different processing workflows carried out by MSstastTMT.

spikein1_model_msTrawler.rds: a data.frame containing the msTrawler statistical inference results for the spikin1 dataset.

spikein1_model_msqrob2tmt.rds: a data.frame containing the msqrob2TMT statistical inference results for the spikin1 dataset.

spikein2_input.rds: a data.frame containing the spikein2 data after running the custom preprocessing pipeline by O'Brien et al. 2024.

spikein2_input_preprocessed.rds: a data.frame containing the spikein2 data after running the custom preprocessing workflow by O'Brien et al. 2024 and the preprocessing workflow by msTrawler.

spikein2_model_DEqMS.rds: a data.frame containing the DEqMS statistical inference results for the spikin2 dataset.

spikein2_model_msqrob2tmt.rds: a data.frame containing the msqrob2tmt statistical inference results for the spikin2 dataset.

spikein2_model_MSstatsTMT.rds: a data.frame containing the MSstatsTMT statistical inference results for the spikin2 dataset.

spikein2_model_msTrawler.rds: a data.frame containing the msTrawler statistical inference results for the spikin2 dataset.

GeoDAR: Georeferenced global Dams And Reservoirs dataset for bridging...

zenodo.org
data.niaid.nih.gov

zip

Updated Jan 19, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Jida Wang; Jida Wang; Blake A. Walter; Fangfang Yao; Fangfang Yao; Chunqiao Song; Chunqiao Song; Meng Ding; Abu S. Maroof; Jingying Zhu; Chenyu Fan; Jordan M. McAlister; Md Safat Sikder; Md Safat Sikder; Yongwei Sheng; Yongwei Sheng; George H. Allen; George H. Allen; Jean-François Crétaux; Yoshihide Wada; Yoshihide Wada; Blake A. Walter; Meng Ding; Abu S. Maroof; Jingying Zhu; Chenyu Fan; Jordan M. McAlister; Jean-François Crétaux (2024). GeoDAR: Georeferenced global Dams And Reservoirs dataset for bridging attributes and geolocations [Dataset]. http://doi.org/10.5281/zenodo.6163413

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6163413

Dataset updated

Jan 19, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Documented March 19, 2023

!!NEW!!!

GeoDAR reservoirs were registered to the drainage network! Please see the auxiliary data "GeoDAR-TopoCat" at https://zenodo.org/records/7750736. "GeoDAR-TopoCat" contains the drainage topology (reaches and upstream/downstream relationships) and catchment boundary for each reservoir in GeoDAR, based on the algorithm used for Lake-TopoCat (doi:10.5194/essd-15-3483-2023).

Documented April 1, 2022

Citation

Wang, J., Walter, B. A., Yao, F., Song, C., Ding, M., Maroof, A. S., Zhu, J., Fan, C., McAlister, J. M., Sikder, M. S., Sheng, Y., Allen, G. H., Crétaux, J.-F., and Wada, Y.: GeoDAR: georeferenced global dams and reservoirs database for bridging attributes and geolocations. Earth System Science Data, 14, 1869–1899, 2022, https://doi.org/10.5194/essd-14-1869-2022.

Please cite the reference above (which was fully peer-reviewed), NOT the preprint version. Thank you.

Contact

Dr. Jida Wang, jidawang@ksu.edu, gdbruins@ucla.edu

Data description and components

Data folder “GeoDAR_v10_v11” (.zip) contains two consecutive, peer-reviewed versions (v1.0 and v1.1) of the Georeferenced global Dams And Reservoirs (GeoDAR) dataset:

GeoDAR_v10_dams (in both shapefile format and the comma-separated values (csv) format): GeoDAR version 1.0, including 22,560 dam points georeferenced based on the World Register of Dams (WRD), the International Commission on Large Dams (ICOLD; https://www.icold-cigb.org; last access on March 13th, 2019).
GeoDAR_v11_dams (in both shapefile and csv): GeoDAR version 1.1 dam points, including 24,783 dams which harmonized GeoDAR_v10_dams and the Global Reservoir and Dam Database (GRanD) v1.3 (Lehner et al., 2011).
GeoDAR_v11_reservoirs (in shapefile): GeoDAR version 1.1 reservoirs, including 21,515 reservoir polygons retrieved by associating GeoDAR_v11_dams with GRanD v1.3 reservoirs, HydroLAKES v1.0 (Messager et al., 2016), and the UCLA Circa 2015 Lake Inventory (Sheng et al., 2016). The reservoir retrieval follows a one-to-one relationship between dams and reservoirs.

As by-products of GeoDAR harmonization, folder “GeoDAR_v10_v11” also contains:

GRanD_v13_issues.csv: This file contains the original records of all 7,320 dam points in GRanD v1.3, with 94 of them marked by our identified issues and suggested corrections. These 94 records are placed at the beginning of this table. They include 89 records showing possible georeferencing and/or attribute errors, and another 5 records documented as subsumed or replaced. Our added fields start from column BG and include:
- “Issue”: main issue(s) of this record
- “Description”: more detailed explanation of the issue
- “Lat_corrected”: suggested correction for latitude (if any) in decimal degree
- “Lon_corrected”: suggested correction for longitude (if any) in decimal degree
- “Correction_source”: correction source(s)
- “Harmonized”: whether this GRanD dam was harmonized in GeoDAR v1.1 and the reason.
Wada_et_al_2017_harmonized.csv: This csv file contains the original records of all 139 georeferenced large dams/reservoirs in Wada et al. (2017; doi:10.1007/s10712-016-9399-6), with our revised storage capacities and spatial coordinates for data harmonization. Our added fields start from column E and include:
- Revised_capacity_km3: Our revised reservoir storage capacity in cubic kilometers used for harmonization
- Revised_lat: Revised latitude in decimal degree
- Revised_lon: Revised longitude in decimal degree
- Verification_notes: Description of the issues, verification sources, and other information used for harmonization.

Attribute description

Attribute	Description and values
v1.0 dams (file name: GeoDAR_v10_dams; format: comma-separated values (csv) and point shapefile)
id_v10	Dam ID for GeoDAR version 1.0 (type: integer). Note this is not the same as the International Code in ICOLD WRD but is linked to the International Code via encryption.
lat	Latitude of the dam point in decimal degree (type: float) based on datum World Geodetic System (WGS) 1984.
lon	Longitude of the dam point in decimal degree (type: float) on WGS 1984.
geo_mtd	Georeferencing method (type: text). Unique values include “geo-matching CanVec”, “geo-matching LRD”, “geo-matching MARS”, “geo-matching NID”, “geo-matching ODC”, “geo-matching ODM”, “geo-matching RSB”, “geocoding (Google Maps)”, and “Wada et al. (2017)”. Refer to Table 2 in Wang et al. (2022) for abbreviations.
qa_rank	Quality assurance (QA) ranking (type: text). Unique values include “M1”, “M2”, “M3”, “C1”, “C2”, “C3”, “C4”, and “C5”. The QA ranking provides a general measure for our georeferencing quality. Refer to Supplementary Tables S1 and S3 in Wang et al. (2022) for more explanation.
rv_mcm	Reservoir storage capacity in million cubic meters (type: float). Values are only available for large dams in Wada et al. (2017). Capacity values of other WRD records are not released due to ICOLD’s proprietary restriction. Also see Table S4 in Wang et al. (2022).
val_scn	Validation result (type: text). Unique values include “correct”, “register”, “mismatch”, “misplacement”, and “Google Maps”. Refer to Table 4 in Wang et al. (2022) for explanation.
val_src	Primary validation source (type: text). Values include “CanVec”, “Google Maps”, “JDF”, “LRD”, “MARS”, “NID”, “NPCGIS”, “NRLD”, “ODC”, “ODM”, “RSB”, and “Wada et al. (2017)”. Refer to Table 2 in Wang et al. (2022) for abbreviations.
qc	Roles and name initials of co-authors/participants during data quality control (QC) and validation. Name initials are given to each assigned dam or region and are listed generally in chronological order for each role. Collation and harmonization of large dams in Wada et al. (2017) (see Table S4 in Wang et al. (2022)) were performed by JW, and this information is not repeated in the qc attribute for a reduced file size. Although we tried to track the name initials thoroughly, the lists may not be always exhaustive, and other undocumented adjustments and corrections were most likely performed by JW.
v1.1 dams (file name: GeoDAR_v11_dams; format: comma-separated values (csv) and point shapefile)
id_v11	Dam ID for GeoDAR version 1.1 (type: integer). Note this is not the same as the International Code in ICOLD WRD but is linked to the International Code via encryption.
id_v10	v1.0 ID of this dam/reservoir (as in id_v10) if it is also included in v1.0 (type: integer).
id_grd_v13	GRanD ID of this dam if also included in GRanD v1.3 (type: integer).
lat	Latitude of the dam point in decimal degree (type: float) on WGS 1984. Value may be different from that in v1.0.
lon	Longitude of the dam point in decimal degree (type: float) on WGS 1984. Value may be different from that in v1.0.
geo_mtd	Same as the value of geo_mtd in v1.0 if this dam is included in v1.0.
qa_rank	Same as the value of qa_rank in v1.0 if this dam is included in v1.0.
val_scn	Same as the value of val_scn in v1.0 if this dam is included in v1.0.
val_src	Same as the value of val_src in v1.0 if this dam is included in v1.0.
rv_mcm_v10	Same as the value of rv_mcm in v1.0 if this dam is included in v1.0.
rv_mcm_v11	Reservoir storage capacity in million cubic meters (type: float). Due to ICOLD’s proprietary restriction, provided values are limited to dams in Wada et al. (2017) and GRanD v1.3. If a dam is in both Wada et al. (2017) and GRanD v1.3, the value from the latter (if valid) takes precedence.
har_src	Source(s) to harmonize the dam points. Unique values include “GeoDAR v1.0 alone”, “GRanD v1.3 and GeoDAR 1.0”, “GRanD v1.3 and other ICOLD”, and “GRanD v1.3 alone”. Refer to Table 1 in Wang et al. (2022) for more details.
pnt_src	Source(s) of the dam point spatial coordinates. Unique values include “GeoDAR v1.0”, “original

Z
MeanDRS River Width Sampling: Data products corresponding to "Intrinsic...
data.niaid.nih.gov
zenodo.org
Updated Dec 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cerbelaud, Arnaud (2024). MeanDRS River Width Sampling: Data products corresponding to "Intrinsic spatial scales of river stores and fluxes and their relative contributions to the global water cycle" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13381367
Explore at:
Dataset updated
Dec 20, 2024
Dataset provided by
Wade, Jeffrey
Cerbelaud, Arnaud
Lee, Tong
Collins, Elyssa
Famiglietti, James
Gierach, Michelle
Reager, John T.
David, Cédric
Denbina, Michael
Tom, Manu
Frasson, Renato Prata de Moraes
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Corresponding peer-reviewed publication

This dataset corresponds to all input and output files that were used in the study reported in:

Wade, J., David, C.H., Collins, E.L., Denbina, M., Cerbelaud, A., Tom, M., Reager, J.T., Frasson, R.P.M., Famiglietti, J.S., Lee, T., Gierach, M.M. (In Review), Intrinsic spatial scales of river stores and fluxes and their relative contributions to the global water cycle.

When making use of any of the files in this dataset, please cite both the aforementioned article and the dataset herein.

Summary

The Earth’s rivers vary in size across several orders of magnitude. Yet, the relative significance of small upstream reaches compared to large downstream rivers in the global water cycle remains unclear, challenging the determination of adequate spatial resolution for observations. Using monthly simulations of river stores and fluxes from the MeanDRS river routing dataset, we sample global rivers by a range of estimated river width thresholds to investigate the intrinsic spatial scales of the global river water cycle. We frame these scale-dependent river dynamics in terms of observational capabilities, assessing how the size of rivers that can be resolved influences our ability to capture key global hydrologic stores and fluxes.

We aim to answer two questions:

What is the intrinsic spatial resolution of global river dynamics?

How can the spatial scale of river processes be used to inform efficient monitoring and modeling strategies of global river stores and fluxes?

Data sources

The following sources were used to produce files in this dataset:

Mean Discharge Runoff and Storage (MeanDRS) dataset (version v0.4) available under a CC BY-NC-SA 4.0 license. https://zenodo.org/records/10013744. DOI: 10.5281/zenodo.10013744; 10.1038/s41561-024-01421-5

MERIT-Basins (version 1.0) derived from MERIT-Hydro (version 0.7) available under a CC BY-NC-SA 4.0 license. https://www.reachhydro.org/home/params/merit-basins

Software

The software that was used to produce files in this dataset are available at https://github.com/jswade/meandrs-width-sampling.

Data Products

The following files represent the primary outputs of the analysis. Each file class generally has 61 files, corresponding to the 61 global hydrologic regions (region ii).

Riv_coast.zip contains shapefiles of corrected and uncorrected MeanDRS river reaches that intersect with the global coast and are inferred to drain to the ocean.

· riv_coast.zip

o cor: riv_coast_pfaf_ii_COR.shp o uncor: riv_coast_pfaf_ii_UNCOR.shp

Qout_rivwidth.zip contains csv files of the aggregate river discharge to the ocean (km3/yr) of under each tested river width sampling scenario for each of the 61 global hydrologic regions.

· Qout_rivwidth.zip: Qout_pfaf_ii_rivwidth.csv

V_rivwidth_low.zip contains csv files of the aggregate river storage (km3) for the low residence time scenario under each tested river width sampling scenario for each of the 61 global hydrologic regions.

· V_rivwidth_low.zip: V_pfaf_ii_rivwidth_low.csv

V_rivwidth_nrm.zip contains csv files of the aggregate river storage (km3) for the normal (medium) residence time scenario under each tested river width sampling scenario for each of the 61 global hydrologic regions.

· V_rivwidth_nrm.zip: V_pfaf_ii_rivwidth_nrm.csv

V_rivwidth_hig.zip contains csv files of the aggregate river storage (km3) for the high residence time scenario under each tested river width sampling scenario for each of the 61 global hydrologic regions.

· V_rivwidth_hig.zip: V_pfaf_ii_rivwidth_hig.csv

Largest_rivs.zip contains files related to our analysis of the relative contributions of discharge to the ocean from the 10 largest global river basins.

· largest_rivs.zip

o cat: cat_dis_top10_nxx.shp – dissolved catchments of reaches draining from the 10 largest basins o csv: Q_df_top10.csv – total discharge contributed by each basin o riv: riv_top10_nxx.shp – river reaches that drain the 10 largest basins

Smallest_rivs.zip contains files related to our analysis of the relative contributions of discharge to the ocean from global rivers narrower than 100 m.

· smallest_rivs.zip

o cat: cat_pfaf_pfaf_ii_small_100m.shp – dissolved catchments of narrow reaches draining to the ocean for each region ii o csv: Q_df_top10.csv – total discharge to the ocean from each narrow river reach o riv: riv_pfaf_ii_small_100m.shp – river reaches narrower than 100 m that drain to the ocean for each region ii

Global_summary.zip contains files related to the global aggregation of our region-specific river width sampling estimates for discharge to the ocean and river storage.

· global_summary.zip

o Qout_rivwidth: global summary files for discharge to the ocean (km3/yr) under river width sampling o V_rivwidth_low: global summary files for total river storage (km3) for the low residence time scenario under river width sampling o V_rivwidth_nrm: global summary files for total river storage (km3) for the normal (medium) residence time scenario under river width sampling o V_rivwidth_hig: global summary files for total river storage (km3) for the hig residence time scenario under river width sampling o cat_small_gl: cat_dis_global_small_100m.shp – global dissolved catchments contributing to all rivers narrower than 100 m that drain to the ocean

Rivwidth_sens.zip contains files related to our supplemental analysis of the sensitivity of our width estimation approach to choice of input discharge dataset. Here, we compute estimated river widths using 3 versions of MeanDRS discharge outputs (VIC, CLSM, NOAH) and compare the results of river width sampling from those runs to that of the primary analysis. The file formats and explanations follow those presented above, with added information for the land surface model used to generate those discharge simulations.

· Rivwidth_sens.zip

o riv_coast o Qout_rivwidth_VIC o Qout_rivwidth_CLSM o Qout_rivwidth_NOAH o V_rivwidth_low_VIC o V_rivwidth_nrm_VIC o V_rivwidth_hig_VIC o V_rivwidth_low_CLSM o V_rivwidth_nrm_CLSM o V_rivwidth_hig_CLSM o V_rivwidth_low_NOAH o V_rivwidth_nrm_NOAH o V_rivwidth_hig_NOAH o global_summary_VIC o global_summary_CLSM o global_summary_NOAH

Cor_sens.zip contains files related to our supplemental analysis of the sensitivity use of corrected ensemble MeanDRS discharge and volume simulations as opposed to uncorrected ensemble simulations. Here, we repeat our primary analysis using only uncorrected simulations throughout, rather than performing river width sampling using corrected simulations. The file formats and explanations follow those presented above, with the files using uncorrected ensemble (ENS) discharge and storage values in contrast to the primary analysis.

· Cor_sens.zip

o Qout_rivwidth_ENS o V_rivwidth_low_ENS o V_rivwidth_nrm_ENS o V_rivwidth_hig_ENS o global_summary_ENS

Width_val.zip contains files related to our supplemental validation of river widths estimated from MeanDRS discharge simulations through comparison with optical measurements of widths from the Global River Widths from Landsat (GRWL) Databse (Allen & Pavelsky, 2018).

· Width_val.zip: width_validation_pfaf_ii.csv

Known bugs in this dataset or the associated manuscript

No bugs have been identified at this time.

References

Allen, G. H., & Pavelsky, T. M. (2018). Global extent of rivers and streams. Science, 361(6402), 585-588. https://doi.org/10.1126/science.aat0636

Collins, E. L., David, C. H., Riggs, R., Allen, G. H., Pavelsky, T. M., Lin, P., Pan, M., Yamazaki, D., Meentemeyer, R. K., & Sanchez, G. M. (2024). Global patterns in river water storage dependent on residence time. Nature Geoscience, 1–7. https://doi.org/10.1038/s41561-024-01421-5

Lin, P., Pan, M., Beck, H. E., Yang, Y., Yamazaki, D., Frasson, R., David, C. H., Durand, M., Pavelsky, T. M., Allen, G. H., Gleason, C. J., & Wood, E. F. (2019). Global Reconstruction of Naturalized River Flows at 2.94 Million Reaches. Water Resources Research, 55(8), 6499–6516. https://doi.org/10.1029/2019WR025287

Yang, Y., Pan, M., Lin, P., Beck, H. E., Zeng, Z., Yamazaki, D., David, C. H., Lu, H., Yang, K., Hong, Y., & Wood, E. F. (2021). Global Reach-Level 3-Hourly River Flood Reanalysis (1980–2019). Bulletin of the American Meteorological Society, 102(11), E2086–E2105. https://doi.org/10.1175/BAMS-D-20-0057.1
c
ckanext-datapreview
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-datapreview [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-datapreview
Explore at:
Dataset updated
Jun 4, 2025
Description
The datapreview extension for CKAN enhances data accessibility by providing a proxy to retrieve and format data from local storage or remote URLs for previewing in applications like Recline. It addresses performance and file size limitations found in similar solutions, offering a streamlined way to preview CSV and XLS files within the CKAN environment by leveraging the ckanext-archiver extension. This extension provides a local implementation of data proxy functionality, aiming to improve the efficiency of data previewing, especially for larger datasets. Key Features: Data Proxy Functionality: Serves as a proxy for retrieving data from local or remote sources, formatting it into a JSON dictionary suitable for data preview tools. CSV/XLS Parsing: Parses CSV and XLS files to extract data for preview, enabling users to quickly inspect data content without downloading the entire file. File Size Limit Configuration: Allows administrators to configure a maximum file size limit for remote downloads and in-memory processing, preventing server overload when handling large datasets. Local Archive Cache Utilization: Integrates with ckanext-archiver to prioritize retrieving data from the local archive cache, reducing reliance on remote sources and improving retrieval speed if files have already been archived. Technical Integration: The datapreview extension integrates with CKAN by adding a new controller that handles data proxy requests. It relies on the resource ID rather than a URL, which differs from the original dataproxy implementation. The extension also depends on ckanext-archiver for accessing cached resources and messytables for handling CSV and Excel file parsing. To enable the extension, it must be added to the ckan.plugins property in the CKAN configuration file. Benefits & Impact: The datapreview extension improves the performance and scalability of data previewing within CKAN. By using a local archive cache and allowing configuration of file size limits, it addresses the limitations of the original dataproxy implementation. It also enables the previewing of larger files than might otherwise be possible. On data.gov.uk, the extension helps users quickly view data before deciding to download it, which enhances the overall user experience.
TBX11K Simplified - TB X-rays with bounding boxes
kaggle.com
Updated Feb 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vbookshelf (2023). TBX11K Simplified - TB X-rays with bounding boxes [Dataset]. https://www.kaggle.com/datasets/vbookshelf/tbx11k-simplified/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
vbookshelf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The TBX11K dataset is a large dataset containing 11000 chest x-ray images. It's the only TB dataset that I know of that includes TB bounding boxes. This allows both classification and detection models to be trained.

However, it can be mentally tiring to get started with this dataset. It includes many xml, json and txt files that you need to sift through to try to understand what everything means, how it all fits together and how to extract the bounding box coordinates.

Here I've simplified the dataset. Now there's just one csv file, one folder containing the training images and one folder containing the test images.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1086574%2Fa637d3837c261605a3c2f71a18a9b6f0%2FScreenshot%202023-02-08%20at%2012.26.29.png?generation=1675834031712582&alt=media" alt="">

Paper: Rethinking Computer-aided Tuberculosis Diagnosis

Original TBX11K dataset on Kaggle

Notes

1- Please start by reading the paper. It will help you understand what everything means. 2- The original dataset was split into train and validation sets. This split is shown in the 'source' column in the data.csv file. 3- The test images are stored in the folder called "test". There are no labels for these images and I've not included them in data.csv. 4- Each bounding box is on a separate row. Therefore, the file names in the "fname" column are not unique. For example, if an image has two bounding boxes then the file name for that image will appear twice in the "fname" column. 5- The original dataset has a folder named "extra" that contains data from other TB datasets. I've not included that folder here.

Acknowledgements

Many thanks to the team that created the TBX11K dataset and generously made it publicly available.

Citation

# TBX11K dataset @inproceedings{liu2020rethinking, title={Rethinking computer-aided tuberculosis diagnosis}, author={Liu, Yun and Wu, Yu-Huan and Ban, Yunfeng and Wang, Huifang and Cheng, Ming-Ming}, booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={2646--2655}, year={2020} }

Helpful Resources

This is a list of publicly available TB and Pneumonia chest x-ray datasets: https://github.com/vbookshelf/List-of-TB-and-Pneumonia-Datasets
d
Long-term monotonic trends in annual and monthly streamflow metrics at...
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Long-term monotonic trends in annual and monthly streamflow metrics at streamgages in the United States (ver. 2.0, October 2024) [Dataset]. https://catalog.data.gov/dataset/long-term-monotonic-trends-in-annual-and-monthly-streamflow-metrics-at-streamgages-in-the-
Explore at:
Dataset updated
Oct 5, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
The U.S. Geological Survey (USGS) Water Resources Mission Area (WMA) is working to address a need to understand where the Nation is experiencing water shortages or surpluses relative to the demand for water need by delivering routine assessments of water supply and demand and an understanding of the natural and human factors affecting the balance between supply and demand. A key part of these national assessments is identifying long-term trends in water availability, including groundwater and surface water quantity, quality, and use. This data release contains Mann-Kendall monotonic trend analyses for 18 observed annual and monthly streamflow metrics at 6,347 U.S. Geological Survey streamgages located in the conterminous United States, Alaska, Hawaii, and Puerto Rico. Streamflow metrics include annual mean flow, maximum 1-day and 7-day flows, minimum 7-day and 30-day flows, and the date of the center of volume (the date on which 50% of the annual flow has passed by a gage), along with the mean flow for each month of the year. Annual streamflow metrics are computed from mean daily discharge records at U.S. Geological Survey streamgages that are publicly available from the National Water Information System (NWIS). Trend analyses are computed using annual streamflow metrics computed through climate year 2022 (April 2022- March 2023) for low-flow metrics and water year 2022 (October 2021 - September 2022) for all other metrics. Trends at each site are available for up to four different periods: (i) the longest possible period that meets completeness criteria at each site, (ii) 1980-2020, (iii) 1990-2020, (iv) 2000-2020. Annual metric time series analyzed for trends must have 80 percent complete records during fixed periods. In addition, each of these time series must have 80 percent complete records during their first and last decades. All longest possible period time series must be at least 10 years long and have annual metric values for at least 80% of the years running from 2013 to 2022. This data release provides the following five CSV output files along with a model archive: (1) streamflow_trend_results.csv - contains test results of all trend analyses with each row representing one unique combination of (i) NWIS streamgage identifiers, (ii) metric (computed using Oct 1 - Sep 30 water years except for low-flow metrics computed using climate years (Apr 1 - Mar 31), (iii) trend periods of interest (longest possible period through 2022, 1980-2020, 1990-2020, 2000-2020) and (iv) records containing either the full trend period or only a portion of the trend period following substantial increases in cumulative upstream reservoir storage capacity. This is an output from the final process step (#5) of the workflow. (2) streamflow_trend_trajectories_with_confidence_bands.csv - contains annual trend trajectories estimated using Theil-Sen regression, which estimates the median of the probability distribution of a metric for a given year, along with 90 percent confidence intervals (5th and 95h percentile values). This is an output from the final process step (#5) of the workflow. (3) streamflow_trend_screening_all_steps.csv - contains the screening results of all 7,873 streamgages initially considered as candidate sites for trend analysis and identifies the screens that prevented some sites from being included in the Mann-Kendall trend analysis. (4) all_site_year_metrics.csv - contains annual time series values of streamflow metrics computed from mean daily discharge data at 7,873 candidate sites. This is an output of Process Step 1 in the workflow. (5) all_site_year_filters.csv - contains information about the completeness and quality of daily mean discharge at each streamgage during each year (water year, climate year, and calendar year). This is also an output of Process Step 1 in the workflow and is combined with all_site_year_metrics.csv in Process Step 2. In addition, a .zip file contains a model archive for reproducing the trend results using R 4.4.1 statistical software. See the README file contained in the model archive for more information. Caution must be exercised when utilizing monotonic trend analyses conducted over periods of up to several decades (and in some places longer ones) due to the potential for confounding deterministic gradual trends with multi-decadal climatic fluctuations. In addition, trend results are available for post-reservoir construction periods within the four trend periods described above to avoid including abrupt changes arising from the construction of larger reservoirs in periods for which gradual monotonic trends are computed. Other abrupt changes, such as changes to water withdrawals and wastewater return flows, or episodic disturbances with multi-year recovery periods, such as wildfires, are not evaluated. Sites with pronounced abrupt changes or other non-monotonic trajectories of change may require more sophisticated trend analyses than those presented in this data release.
f
Data_Sheet_1_Sediment Stocks of Carbon, Nitrogen, and Phosphorus in Danish...
frontiersin.figshare.com
txt
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Theodor Kindeberg; Sarah B. Ørberg; Maria Emilia Röhr; Marianne Holmer; Dorte Krause-Jensen (2023). Data_Sheet_1_Sediment Stocks of Carbon, Nitrogen, and Phosphorus in Danish Eelgrass Meadows.CSV [Dataset]. http://doi.org/10.3389/fmars.2018.00474.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fmars.2018.00474.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Theodor Kindeberg; Sarah B. Ørberg; Maria Emilia Röhr; Marianne Holmer; Dorte Krause-Jensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Seagrass ecosystems provide an array of ecosystem services ranging from habitat provision to erosion control. From a climate change and eutrophication mitigation perspective, the ecosystem services include burial and storage of carbon and nutrients in the sediments. Eelgrass (Zostera marina) is the most abundant seagrass species along the Danish coasts, and while its function as a carbon and nutrient sink has been documented in some areas, the spatial variability of these functions, and the drivers behind them, are not well understood. Here we present the first nationwide study on eelgrass sediment stock of carbon (Cstock), nitrogen (Nstock), and phosphorus (Pstock). Stocks were measured in the top 10 cm of eelgrass meadows spanning semi-enclosed estuaries (inner and outer fjords) to open coasts. Further, we assessed environmental factors (level of exposure, sediment properties, level of eutrophication) from each area to evaluate their relative importance as drivers of the spatial pattern in the respective stocks. We found large spatial variability in sediment stocks, representing 155–4413 g C m-2, 24–448 g N m-2, and 7–34 g P m-2. Cstock and Nstock were significantly higher in inner fjords compared to outer fjords and open coasts. Cstock, Nstock, and Pstock showed a significantly positive relationship with the silt-clay content in the sediments. Moreover, Cstock was also significantly higher in more eutrophied areas with high concentrations of nutrients and chlorophyll a (chl a) in the water column. Conversely, silt-clay content was not related to nutrients or chl a, suggesting a spatial dependence of the importance of these factors in driving stock sizes and implying that local differences in sediment properties and eutrophication level should be included when evaluating the storage capacity of carbon, nitrogen, and phosphorus in Danish eelgrass meadows. These insights provide guidance to managers in selecting priority areas for carbon and nutrient storage for climate- and eutrophication mitigation initiatives.
Z
Catalogue of patterns for board-based tools
data.niaid.nih.gov
zenodo.org
Updated Mar 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfonso Bravo (2021). Catalogue of patterns for board-based tools [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4609893
Explore at:
Dataset updated
Mar 17, 2021
Dataset provided by
Antonio Ruiz-Cortés
Adela del-Río-Ortega
Joaquín Peña
Manuel Resinas
Alfonso Bravo
Description
Board-based software tools for managing collaborative work (e.g. Trello or Microsoft Planner) are highly configurable information systems. Their structure is based on boards that contain cards organized in lists. This structure allows users to organize a wide variety of formal or informal information and work processes in a very flexible way. However, this flexibility means that in every situation the user is required to make decisions to design a new board from scratch, which is not a straightforward task, specially if performed by non-technical users.

We have carried out a study following an inductive approach consisting of analyzing 91 Trello board designs from board templates proposed by Trello users (see trello-scrapping.csv), which cover a wide variety of domains and use cases. From this analysis we characterize the following 8 patterns that are commonly used in board designs and are applicable to all board-based tools:

Information or resources lifecycle

Ordered Information

Kanban

Process Tasks

Assigned Information

Categorized Information

Assigned Tasks

Categorized Tasks

About the analysis performed

For the sake of the verifiability of the analysis performed, the sources used for the analysis and its details are also available at this repository:

trello-scrapping.csv: In this csv file you can find the actual scrapping of www.trello.com/templates on the date that the paper was submitted (4th December 2020). This scrapping returns the whole list of templates that can be used in the cited workstream collaborative tool. The 91 templates analyzed by us in the paper were obtained with a similar scrapping about a year before (February 2019 approximately). In this file you can take a look at the actual 230 templates created in Trello (including the previous 91), with their names, link and description among other elements. We have added two new columns to the scrapping (these columns aren't obtained of the scrapping) in which we have clasified the templates in their corresponding pattern, as we did in "trello templates clasification.xlsx" only with the first 91 templates for writing the paper. In these two columns we separate between the 91 templates considered in the paper (previously clasified and isolated in "trello templates clasification.xlsx") and the new templates obtained in the second scrape after submitting the paper.

trello templates clasification.xlsx: This file contains the clasification of the 91 analyzed Trello templates divided in three sheets. The first one show the raw clasification, with one row for each template and an "X" in the column of the pattern(s) in which it is classified. In the other sheets there are both summary tables in which you can clearly see the distribution of templates in our proposed patterns (how many templates are there for each pattern?) and the multiple pattern combinations (when a template contains more than one pattern).
g
Demographics
health.google.com
Updated Oct 7, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Demographics [Dataset]. https://health.google.com/covid-19/open-data/raw-data
Explore at:
Dataset updated
Oct 7, 2021
Variables measured
key, population, population_male, rural_population, urban_population, population_female, population_density, clustered_population, population_age_00_09, population_age_10_19, and 11 more
Description
Various population statistics, including structured demographics data.
Vegetables Dataset
kaggle.com
Updated Sep 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rudra prasad bhuyan (2024). Vegetables Dataset [Dataset]. https://www.kaggle.com/datasets/rudraprasadbhuyan/vegetables-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 5, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rudra prasad bhuyan
Description
Overview :

This dataset contains detailed information on a wide variety of vegetables from different regions across the world. Each entry includes data on the vegetable's category, color, seasonality, origin, nutritional value, pricing, availability, shelf life, storage requirements, growing conditions, health benefits, and common varieties. The dataset is structured to facilitate research and data analysis, offering insights into agricultural trends, nutritional science, and market dynamics. Ideal for use in academic research, market analysis, and agricultural studies.

Vegetable dataset Columns Details :

Vegetable ID: Unique identifier for each vegetable entry.

Name: Common name of the vegetable (e.g., Carrot, Broccoli).

Scientific Name: Scientific or botanical name of the vegetable.

Category: Type of vegetable (e.g., Root, Leafy, Fruit, Tubers).

Color: Color of the vegetable (e.g., Orange, Green).

Season: Season(s) when the vegetable is typically harvested (e.g., Spring, Summer).

Origin: Geographic origin or region where the vegetable is commonly grown.

Nutritional Value: Key nutritional information (e.g., calories, vitamins, minerals per 100g).

Price: Average market price per unit or weight.

Availability: Availability status (e.g., Year-round, Seasonal).

Shelf Life: Average shelf life in days.

Storage Requirements: Specific storage conditions (e.g., Refrigeration, Dry, Cool place).

Growing Conditions: Ideal growing conditions (e.g., Soil type, Water requirements, Sunlight).

Health Benefits: Notable health benefits or uses.

Common Varieties: Different varieties or types of vegetables.
g
Economic Indicators
health.google.com
Updated Oct 7, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Economic Indicators [Dataset]. https://health.google.com/covid-19/open-data/raw-data
Explore at:
Dataset updated
Oct 7, 2021
Variables measured
gdp, key, gdp_per_capita, human_capital_index
Description
Various economic indicators.
f
Data_Sheet_2_Computational modeling of decision-making in substance abusers:...
frontiersin.figshare.com
application/csv
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laurent Avila Chauvet; Diana Mejía Cruz (2024). Data_Sheet_2_Computational modeling of decision-making in substance abusers: testing Bechara’s hypotheses.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2024.1281082.s002
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2024.1281082.s002
Dataset updated
May 31, 2024
Dataset provided by
Frontiers
Authors
Laurent Avila Chauvet; Diana Mejía Cruz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
One of the cognitive abilities most affected by substance abuse is decision-making. Behavioral tasks such as the Iowa Gambling Task (IGT) provide a means to measure the learning process involved in decision-making. To comprehend this process, three hypotheses have emerged: (1) participants prioritize gains over losses, (2) they exhibit insensitivity to losses, and (3) the capacity of operational storage or working memory comes into play. A dynamic model was developed to examine these hypotheses, simulating sensitivity to gains and losses. The Linear Operator model served as the learning rule, wherein net gains depend on the ratio of gains to losses, weighted by the sensitivity to both. The study further proposes a comparison between the performance of simulated agents and that of substance abusers (n = 20) and control adults (n = 20). The findings indicate that as the memory factor increases, along with high sensitivity to losses and low sensitivity to gains, agents prefer advantageous alternatives, particularly those with a lower frequency of punishments. Conversely, when sensitivity to gains increases and the memory factor decreases, agents prefer disadvantageous alternatives, especially those that result in larger losses. Human participants confirmed the agents’ performance, particularly when contrasting optimal and sub-optimal outcomes. In conclusion, we emphasize the importance of evaluating the parameters of the linear operator model across diverse clinical and community samples.

Facebook

Twitter

Click to copy link

Link copied

Cite

Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Explore at:

Dataset updated

Apr 21, 2025

Dataset provided by

Agricultural Research Servicehttps://www.ars.usda.gov/

Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Clear search

Close search

Google apps

Main menu

Data from: Current and projected research data storage needs of Agricultural...

The Canada Trademarks Dataset

NII Face Mask Dataset

NII Face Mask Dataset v1.0

Pre-print version

Final version

1. Directory Structure

2. Description for each files in detail.

Metrics to Characterizing Periods of Anomalously Low Water Storage for...

Large Ensemble Dataset for Discovering Global Peak Water Limit of Future...

Supplementary Data for Large-scale Spatially Explicit Analysis of Carbon...

Crowd Counting Dataset

Crowd Counting Dataset

Full version of the dataset includes 647 labeled images of crowds, leave a request on TrainingData to buy the dataset

Statistics for the dataset (number of images by the crowd's size and image width):

OTHER BIOMETRIC DATASETS:

Get the Dataset

This is just an example of the data

Content

File with the extension .csv

TrainingData provides high-quality data annotation tailored to your needs

Table_7_Time Is of the Essence—Early Activation of the Mevalonate Pathway in...

Data for reproducing "msqrob2TMT: robust linear mixed models for inferring...

Input data

spikein1

spikein2

mouse

Processed data

GeoDAR: Georeferenced global Dams And Reservoirs dataset for bridging...

MeanDRS River Width Sampling: Data products corresponding to "Intrinsic...

ckanext-datapreview

TBX11K Simplified - TB X-rays with bounding boxes

Notes

Acknowledgements

Citation

Helpful Resources

Long-term monotonic trends in annual and monthly streamflow metrics at...

Data_Sheet_1_Sediment Stocks of Carbon, Nitrogen, and Phosphorus in Danish...

Catalogue of patterns for board-based tools

Demographics

Vegetables Dataset

Overview :

Vegetable dataset Columns Details :

Economic Indicators

Data_Sheet_2_Computational modeling of decision-making in substance abusers:...

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016See More Versions

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016