95 datasets found

Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
i
Large and Long-Range Graph Dataset
ieee-dataport.org
Updated Sep 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shuo wang (2025). Large and Long-Range Graph Dataset [Dataset]. https://ieee-dataport.org/documents/large-and-long-range-graph-dataset
Explore at:
Dataset updated
Sep 18, 2025
Authors
shuo wang
Description
PCQM-Contact (CC BY 4.0)
f
Data from: Ab Initio Potential Energy Surface for NaCl–H2 with Correct...
acs.figshare.com
datasetcatalog.nlm.nih.gov
zip
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priyanka Pandey; Chen Qu; Apurba Nandi; Qi Yu; Paul L. Houston; Riccardo Conte; Joel M. Bowman (2024). Ab Initio Potential Energy Surface for NaCl–H2 with Correct Long-Range Behavior [Dataset]. http://doi.org/10.1021/acs.jpca.3c07687.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jpca.3c07687.s001
Dataset updated
Jan 25, 2024
Dataset provided by
ACS Publications
Authors
Priyanka Pandey; Chen Qu; Apurba Nandi; Qi Yu; Paul L. Houston; Riccardo Conte; Joel M. Bowman
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We report a full dimensional ab initio potential energy surface for NaCl–H2 based on precise fitting of a large data set of CCSD(T)/aug-cc-pVTZ energies. A major goal of this fit is to describe the very long-range interaction accurately. This is done in this instance via the dipole–quadrupole interaction. The NaCl dipole and the H2 quadrupole are available through previous works over a large range of internuclear distances. We use these to obtain exact effect charges on each atom. Diffusion Monte Carlo calculations are done for the ground vibrational state using the new potential.
H
Large Dataset of Generalization Patterns in the Number Game
dataverse.harvard.edu
search.dataone.org
Updated Aug 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric J. Bigelow; Steven T. Piantadosi (2018). Large Dataset of Generalization Patterns in the Number Game [Dataset]. http://doi.org/10.7910/DVN/A8ZWLF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/A8ZWLF
Dataset updated
Aug 10, 2018
Dataset provided by
Harvard Dataverse
Authors
Eric J. Bigelow; Steven T. Piantadosi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. “even numbers,” “powers of two”) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.
Long-term climatic data for cities in Asia
kaggle.com
zip
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahdan. M. ArioB (2024). Long-term climatic data for cities in Asia [Dataset]. https://www.kaggle.com/datasets/mohammadrahdanmofrad/long-term-climatic-data-for-cities-in-asia
Explore at:
zip(38203945 bytes)Available download formats
Dataset updated
Mar 18, 2024
Authors
Rahdan. M. ArioB
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
Asia
Description
Datasets provides long-term climate data for large Asian cities with populations over 500,000. The dataset includes data on cloud cover, temperature range, number of frost days, potential evapotranspiration, precipitation, minimum temperature, mean temperature, maximum temperature, relative humidity, and number of wet days. The dataset includes data for 831 cities.

Columns:

ID

Date

Latitude

Longitude

cld: Cloud cover (%)

dtr: Temperature range (°C)

frs: Number of frost days

pet: Potential evapotranspiration (mm)

pre: Precipitation (mm)

tmn: Minimum temperature (°C)

tmp: Mean temperature (°C)

tmx: Maximum temperature (°C)

vap: Relative humidity (%)

wet: Number of wet days

Inspiration:
Are you interested in predicting the future weather conditions in your city or one of the 831 cities in our climate dataset? Our climate dataset contains data on various climate metrics, including temperature, precipitation, cloud cover, wind speed, and humidity. This data can be used to train a machine learning model that can predict future weather conditions with high accuracy. Imagine using a machine learning model to predict the weather in your city for the next week, month, or year. This information could be used to make decisions about planning, adaptation, and risk mitigation.

Please note:
This dataset contains satellite-derived climate data from the website https://crudata.uea.ac.uk. Satellite data are measured using sensors that may be subject to error. Therefore, it is possible that these data may differ from ground-based observations, which are typically used to generate real-world data. This difference is generally greater in remote areas and regions with high cloud.
d
Big Free-Tailed Bat Range - CWHR M041 [ds1836]
catalog.data.gov
data.cnra.ca.gov
+5more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2025). Big Free-Tailed Bat Range - CWHR M041 [ds1836] [Dataset]. https://catalog.data.gov/dataset/big-free-tailed-bat-range-cwhr-m041-ds1836-b6ec5
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Department of Fish and Wildlife
Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
d
Large-Blotched Ensatina Range - CWHR A012B [ds2847]
catalog.data.gov
data.cnra.ca.gov
+5more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2025). Large-Blotched Ensatina Range - CWHR A012B [ds2847] [Dataset]. https://catalog.data.gov/dataset/large-blotched-ensatina-range-cwhr-a012b-ds2847-ed46b
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Department of Fish and Wildlife
Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
🏃🏻‍♂️ Long-distance running dataset
kaggle.com
zip
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2024). 🏃🏻‍♂️ Long-distance running dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/long-distance-running-dataset
Explore at:
zip(393989255 bytes)Available download formats
Dataset updated
Mar 7, 2024
Authors
mexwell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About

This dataset contains 10,703,690 records of running training during 2019 and 2020, from 36,412 athletes from around the world. The records were obtained through web scraping of a large social network for athletes on the internet.

The data with the athletes' activities are contained in dataframe objects (tabular data) and saved in the Parquet file format using the Pandas library, part of the Python ecosystem for data science. Each Pandas dataframe contains the following data (as different columns) for each athlete (as different rows), the first word identifies the name of the column in the dataframe: - datetime: date of the running activity; - athlete: a computer-generated ID for the athlete (integer); - distance: distance of running (floating-point number, in kilometers); - duration: duration of running (floating-point number, in minutes); - gender: gender (string 'M' of 'F'); - age_group: age interval (one of the strings '18 - 34', '35 - 54', or '55 +'); - country: country of origin of the athlete (string); - major: marathon(s) and year(s) the athlete ran (comma-separated list of strings).

For convenience, we created files with the athletes' activities data sampled at different frequencies: day 'd', week 'w', month 'm', and quarter 'q' (i.e., there are files with the distance and duration of running accumulated at each day, week, month, and quarter) for each year, 2019 and 2020. Accordingly, the files are named 'run_ww_yyyy_f.parquet', where 'yyyy' is '2019' or '2020' and 'f' is 'd', 'w', 'm' or 'q' (without quotes). The dataset also contains data with different government’s stringency indexes for the COVID-19 pandemic. These data are saved as text files and were obtained from https://ourworldindata.org/covid-stringency-index.

Acknowlegement

Foto von sporlab auf Unsplash
d
Big Brown Bat Range - CWHR M032 [ds1828]
catalog.data.gov
data.ca.gov
+4more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2025). Big Brown Bat Range - CWHR M032 [ds1828] [Dataset]. https://catalog.data.gov/dataset/big-brown-bat-range-cwhr-m032-ds1828-09a43
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Department of Fish and Wildlife
Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
Z
Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...
data.niaid.nih.gov
Updated Aug 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6624080
Explore at:
Dataset updated
Aug 10, 2022
Dataset provided by
University of Cincinnati
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset:

N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Journal of Data, vol. 7, no. 8, p. 109, Aug. 2022, doi: 10.3390/data7080109

Abstract

The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description

The dataset comprises a total of 52,984 Tweet IDs (that correspond to the same number of Tweets) about online learning that were posted on Twitter from 9th November 2021 to 13th July 2022. The earliest date was selected as 9th November 2021, as the Omicron variant was detected for the first time in a sample that was collected on this date. 13th July 2022 was the most recent date as per the time of data collection and publication of this dataset.

The dataset consists of 9 .txt files. An overview of these dataset files along with the number of Tweet IDs and the date range of the associated tweets is as follows. Table 1 shows the list of all the synonyms or terms that were used for the dataset development.

Filename: TweetIDs_November_2021.txt (No. of Tweet IDs: 1283, Date Range of the associated Tweet IDs: November 1, 2021 to November 30, 2021)

Filename: TweetIDs_December_2021.txt (No. of Tweet IDs: 10545, Date Range of the associated Tweet IDs: December 1, 2021 to December 31, 2021)

Filename: TweetIDs_January_2022.txt (No. of Tweet IDs: 23078, Date Range of the associated Tweet IDs: January 1, 2022 to January 31, 2022)

Filename: TweetIDs_February_2022.txt (No. of Tweet IDs: 4751, Date Range of the associated Tweet IDs: February 1, 2022 to February 28, 2022)

Filename: TweetIDs_March_2022.txt (No. of Tweet IDs: 3434, Date Range of the associated Tweet IDs: March 1, 2022 to March 31, 2022)

Filename: TweetIDs_April_2022.txt (No. of Tweet IDs: 3355, Date Range of the associated Tweet IDs: April 1, 2022 to April 30, 2022)

Filename: TweetIDs_May_2022.txt (No. of Tweet IDs: 3120, Date Range of the associated Tweet IDs: May 1, 2022 to May 31, 2022)

Filename: TweetIDs_June_2022.txt (No. of Tweet IDs: 2361, Date Range of the associated Tweet IDs: June 1, 2022 to June 30, 2022)

Filename: TweetIDs_July_2022.txt (No. of Tweet IDs: 1057, Date Range of the associated Tweet IDs: July 1, 2022 to July 13, 2022)

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development

Terminology

List of synonyms and terms

COVID-19

Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus

online learning

online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures
f
Data from: Chemical Descriptors for a Large-Scale Study on Drop-Weight...
acs.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frank W. Marrs; Jack V. Davis; Alexandra C. Burch; Geoffrey W. Brown; Nicholas Lease; Patricia L. Huestis; Marc J. Cawkwell; Virginia W. Manner (2023). Chemical Descriptors for a Large-Scale Study on Drop-Weight Impact Sensitivity of High Explosives [Dataset]. http://doi.org/10.1021/acs.jcim.2c01154.s002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.2c01154.s002
Dataset updated
Jun 3, 2023
Dataset provided by
ACS Publications
Authors
Frank W. Marrs; Jack V. Davis; Alexandra C. Burch; Geoffrey W. Brown; Nicholas Lease; Patricia L. Huestis; Marc J. Cawkwell; Virginia W. Manner
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The drop-weight impact test is an experiment that has been used for nearly 80 years to evaluate handling sensitivity of high explosives. Although the results of this test are known to have large statistical uncertainties, it is one of the most common tests due to its accessibility and modest material requirements. In this paper, we compile a large data set of drop-weight impact sensitivity test results (mainly performed at Los Alamos National Laboratory), along with a compendium of molecular and chemical descriptors for the explosives under test. These data consist of over 500 unique explosives, over 1000 repeat tests, and over 100 descriptors, for a total of about 1500 observations. We use random forest methods to estimate a model of explosive handling sensitivity as a function of chemical and molecular properties of the explosives under test. Our model predicts well across a wide range of explosive types, spanning a broad range of explosive performance and sensitivity. We find that properties related to explosive performance, such as heat of explosion, oxygen balance, and functional group, are highly predictive of explosive handling sensitivity. Yet, models that omit many of these properties still perform well. Our results suggest that there is not one or even several factors that explain explosive handling sensitivity, but that there are many complex, interrelated effects at play.
California Giant Salamander Range - CWHR A004 [ds1133]
catalog.data.gov
data.cnra.ca.gov
+4more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2025). California Giant Salamander Range - CWHR A004 [ds1133] [Dataset]. https://catalog.data.gov/dataset/california-giant-salamander-range-cwhr-a004-ds1133-ed51b
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for California's wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
ECMWF Reanalysis v5
ecmwf.int
application/x-grib
Updated Dec 31, 1969
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (1969). ECMWF Reanalysis v5 [Dataset]. https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5
Explore at:
application/x-grib(1 datasets)Available download formats
Dataset updated
Dec 31, 1969
Dataset authored and provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
land and oceanic climate variables. The data cover the Earth on a 31km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80km. ERA5 includes information about uncertainties for all variables at reduced spatial and temporal resolutions.
Z
Fused Image dataset for convolutional neural Network-based crack Detection...
data.niaid.nih.gov
zenodo.org
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shanglian Zhou; Carlos Canchila; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6383043
Explore at:
Dataset updated
Apr 20, 2023
Authors
Shanglian Zhou; Carlos Canchila; Wei Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

In addition, an image dataset for crack classification has also been published at [6].

References:

[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

5 Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
House Price Prediction Dataset
kaggle.com
zip
Updated Sep 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zafar (2024). House Price Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/zafarali27/house-price-prediction-dataset
Explore at:
zip(29372 bytes)Available download formats
Dataset updated
Sep 21, 2024
Authors
Zafar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
House Price Prediction Dataset.

The dataset contains 2000 rows of house-related data, representing various features that could influence house prices. Below, we discuss key aspects of the dataset, which include its structure, the choice of features, and potential use cases for analysis.

1. Dataset Features

The dataset is designed to capture essential attributes for predicting house prices, including:

Area: Square footage of the house, which is generally one of the most important predictors of price. Bedrooms & Bathrooms: The number of rooms in a house significantly affects its value. Homes with more rooms tend to be priced higher. Floors: The number of floors in a house could indicate a larger, more luxurious home, potentially raising its price. Year Built: The age of the house can affect its condition and value. Newly built houses are generally more expensive than older ones. Location: Houses in desirable locations such as downtown or urban areas tend to be priced higher than those in suburban or rural areas. Condition: The current condition of the house is critical, as well-maintained houses (in 'Excellent' or 'Good' condition) will attract higher prices compared to houses in 'Fair' or 'Poor' condition. Garage: Availability of a garage can increase the price due to added convenience and space. Price: The target variable, representing the sale price of the house, used to train machine learning models to predict house prices based on the other features.

2. Feature Distributions

Area Distribution: The area of the houses in the dataset ranges from 500 to 5000 square feet, which allows analysis across different types of homes, from smaller apartments to larger luxury houses. Bedrooms and Bathrooms: The number of bedrooms varies from 1 to 5, and bathrooms from 1 to 4. This variance enables analysis of homes with different sizes and layouts. Floors: Houses in the dataset have between 1 and 3 floors. This feature could be useful for identifying the influence of multi-level homes on house prices. Year Built: The dataset contains houses built from 1900 to 2023, giving a wide range of house ages to analyze the effects of new vs. older construction. Location: There is a mix of urban, suburban, downtown, and rural locations. Urban and downtown homes may command higher prices due to proximity to amenities. Condition: Houses are labeled as 'Excellent', 'Good', 'Fair', or 'Poor'. This feature helps model the price differences based on the current state of the house. Price Distribution: Prices range between $50,000 and $1,000,000, offering a broad spectrum of property values. This range makes the dataset appropriate for predicting a wide variety of housing prices, from affordable homes to luxury properties.

3. Correlation Between Features

A key area of interest is the relationship between various features and house price: Area and Price: Typically, a strong positive correlation is expected between the size of the house (Area) and its price. Larger homes are likely to be more expensive. Location and Price: Location is another major factor. Houses in urban or downtown areas may show a higher price on average compared to suburban and rural locations. Condition and Price: The condition of the house should show a positive correlation with price. Houses in better condition should be priced higher, as they require less maintenance and repair. Year Built and Price: Newer houses might command a higher price due to better construction standards, modern amenities, and less wear-and-tear, but some older homes in good condition may retain historical value. Garage and Price: A house with a garage may be more expensive than one without, as it provides extra storage or parking space.

4. Potential Use Cases

The dataset is well-suited for various machine learning and data analysis applications, including:

House Price Prediction: Using regression techniques, this dataset can be used to build a model to predict house prices based on the available features. Feature Importance Analysis: By using techniques such as feature importance ranking, data scientists can determine which features (e.g., location, area, or condition) have the greatest impact on house prices. Clustering: Clustering techniques like k-means could help identify patterns in the data, such as grouping houses into segments based on their characteristics (e.g., luxury homes, affordable homes). Market Segmentation: The dataset can be used to perform segmentation by location, price range, or house type to analyze trends in specific sub-markets, like luxury vs. affordable housing. Time-Based Analysis: By studying how house prices vary with the year built or the age of the house, analysts can derive insights into the trends of older vs. newer homes.

5. Limitations and ...
Twitter Conversations about the COVID-19 Omicron Variant: A Large Scale...
zenodo.org
dataverse.harvard.edu
+1more
txt
Updated Jul 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur; Nirmalya Thakur (2022). Twitter Conversations about the COVID-19 Omicron Variant: A Large Scale Dataset of more than 500,000 Tweets [Dataset]. http://doi.org/10.5281/zenodo.6893676
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6893676
Dataset updated
Jul 25, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nirmalya Thakur; Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset:

N. Thakur and C.Y. Han, “An Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection,” Journal of COVID, 2022, Volume 5, Issue 3, pp. 1026-1049

Abstract

This open-access dataset is one of the salient contributions of the above-mentioned paper. It presents a total of 522,886 Tweet IDs of the same number of Tweets about the SARS-CoV-2 Omicron Variant posted on Twitter since the first detected case of this variant on November 24, 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description

The Tweet IDs are presented in 7 different .txt files based on the timelines of the associated tweets. The data collection followed a keyword-based approach and tweets comprising the "omicron" keyword were filtered, collected, and added to this dataset. The following is the description of these dataset files.

Filename: TweetIDs_November.txt (No. of Tweet IDs: 16471, Date Range of the Tweet IDs: November 24, 2021 to November 30, 2021)

Filename: TweetIDs_December.txt (No. of Tweet IDs: 99288, Date Range of the Tweet IDs: December 1, 2021 to December 31, 2021)

Filename: TweetIDs_January.txt (No. of Tweet IDs: 92860, Date Range of the Tweet IDs: January 1, 2022 to January 31, 2022)

Filename: TweetIDs_February.txt (No. of Tweet IDs: 89080, Date Range of the Tweet IDs: February 1, 2022 to February 28, 2022)

Filename: TweetIDs_March.txt (No. of Tweet IDs: 97844, Date Range of the Tweet IDs: March 1, 2022 to March 31, 2022)

Filename: TweetIDs_April.txt (No. of Tweet IDs: 91587, Date Range of the Tweet IDs: April 1, 2022 to April 20, 2022)

Filename: TweetIDs_May.txt (No. of Tweet IDs: 35756, Date Range of the Tweet IDs: May 1, 2022 to May 12, 2022)

In the above table, the last date for May is May 12 as it was the most recent date at the time of data collection and dataset upload. The dataset would be updated soon to incorporate more recent tweets.

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.
N
Comprehensive Median Household Income and Distribution Dataset for Grass...
neilsberg.com
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Comprehensive Median Household Income and Distribution Dataset for Grass Range, MT: Analysis by Household Type, Size and Income Brackets [Dataset]. https://www.neilsberg.com/research/datasets/cd9e83ad-b041-11ee-aaca-3860777c1fe6/
Explore at:
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Montana, Grass Range
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the median household income in Grass Range. It can be utilized to understand the trend in median household income and to analyze the income distribution in Grass Range by household type, size, and across various income brackets.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Grass Range, MT Median Household Income Trends (2010-2021, in 2022 inflation-adjusted dollars)

Median Household Income Variation by Family Size in Grass Range, MT: Comparative analysis across 7 household sizes

Income Distribution by Quintile: Mean Household Income in Grass Range, MT

Grass Range, MT households by income brackets: family, non-family, and total, in 2022 inflation-adjusted dollars

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Grass Range median household income. You can refer the same here
Goodness-of-fit filtering in classical metric multidimensional scaling with...
tandf.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Graffelman (2023). Goodness-of-fit filtering in classical metric multidimensional scaling with large datasets [Dataset]. http://doi.org/10.6084/m9.figshare.11389830.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11389830.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Jan Graffelman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Metric multidimensional scaling (MDS) is a widely used multivariate method with applications in almost all scientific disciplines. Eigenvalues obtained in the analysis are usually reported in order to calculate the overall goodness-of-fit of the distance matrix. In this paper, we refine MDS goodness-of-fit calculations, proposing additional point and pairwise goodness-of-fit statistics that can be used to filter poorly represented observations in MDS maps. The proposed statistics are especially relevant for large data sets that contain outliers, with typically many poorly fitted observations, and are helpful for improving MDS output and emphasizing the most important features of the dataset. Several goodness-of-fit statistics are considered, and both Euclidean and non-Euclidean distance matrices are considered. Some examples with data from demographic, genetic and geographic studies are shown.
Twitter Dataset on the 2022 MonkeyPox Outbreak
kaggle.com
zip
Updated Nov 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur, PhD (2022). Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. https://www.kaggle.com/datasets/thakurnirmalya/monkeypox2022tweets
Explore at:
zip(4397490 bytes)Available download formats
Dataset updated
Nov 16, 2022
Authors
Nirmalya Thakur, PhD
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
𝐏𝐥𝐞𝐚𝐬𝐞 𝐜𝐢𝐭𝐞 𝐭𝐡𝐞 𝐟𝐨𝐥𝐥𝐨𝐰𝐢𝐧𝐠 𝐩𝐚𝐩𝐞𝐫 𝐰𝐡𝐞𝐧 𝐮𝐬𝐢𝐧𝐠 𝐭𝐡𝐢𝐬 𝐝𝐚𝐭𝐚𝐬𝐞𝐭: N. Thakur, “MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions,” Infect. Dis. Rep., vol. 14, no. 6, pp. 855–883, 2022, DOI: https://doi.org/10.3390/idr14060087

𝐀𝐛𝐬𝐭𝐫𝐚𝐜𝐭 The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Therefore, this work presents an open-access dataset of 𝟓𝟕𝟏,𝟖𝟑𝟏 𝐓𝐰𝐞𝐞𝐭𝐬 about monkeypox that have been posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset complies with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

𝐃𝐚𝐭𝐚 𝐃𝐞𝐬𝐜𝐫𝐢𝐩𝐭𝐢𝐨𝐧 The dataset consists of a total of 𝟓𝟕𝟏,𝟖𝟑𝟏 𝐓𝐰𝐞𝐞𝐭 𝐈𝐃𝐬 of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 11th November (the most recent date at the time of uploading the most recent version of the dataset). The Tweet IDs are presented in 12 different .txt files based on the timelines of the associated tweets. The following represents the details of these dataset files.

Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the associated Tweet IDs: May 7, 2022, to May 21, 2022)

Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the associated Tweet IDs: May 21, 2022, to May 27, 2022)

Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the associated Tweet IDs: May 27, 2022, to June 5, 2022)

Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the associated Tweet IDs: June 5, 2022, to June 11, 2022)

Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 46718, Date Range of the associated Tweet IDs: June 12, 2022, to June 30, 2022)

Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the associated Tweet IDs: July 1, 2022, to July 23, 2022)

Filename: TweetIDs_Part7.txt (No. of Tweet IDs: 105890, Date Range of the associated Tweet IDs: July 24, 2022, to July 31, 2022)

Filename: TweetIDs_Part8.txt (No. of Tweet IDs: 93959, Date Range of the associated Tweet IDs: August 1, 2022, to August 9, 2022)

Filename: TweetIDs_Part9.txt (No. of Tweet IDs: 50832, Date Range of the associated Tweet IDs: August 10, 2022, to August 24, 2022)

Filename: TweetIDs_Part10.txt (No. of Tweet IDs: 39042, Date Range of the associated Tweet IDs: August 25, 2022, to September 19, 2022)

Filename: TweetIDs_Part11.txt (No. of Tweet IDs: 12341, Date Range of the associated Tweet IDs: September 20, 2022, to October 9, 2022)

Filename: TweetIDs_Part12.txt (No. of Tweet IDs: 15404, Date Range of the associated Tweet IDs: October 10, 2022, to November 11, 2022)

Please note: The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.
N
Dataset for Grass Range, MT Census Bureau Income Distribution by Gender
neilsberg.com
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Dataset for Grass Range, MT Census Bureau Income Distribution by Gender [Dataset]. https://www.neilsberg.com/research/datasets/b3b45159-abcb-11ee-8b96-3860777c1fe6/
Explore at:
Dataset updated
Jan 9, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Montana, Grass Range
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Grass Range household income by gender. The dataset can be utilized to understand the gender-based income distribution of Grass Range income.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Grass Range, MT annual median income by work experience and sex dataset : Aged 15+, 2010-2022 (in 2022 inflation-adjusted dollars)

Grass Range, MT annual income distribution by work experience and gender dataset (Number of individuals ages 15+ with income, 2021)

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Grass Range income distribution by gender. You can refer the same here

Facebook

Twitter

Click to copy link

Link copied

Cite

Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Explore at:

Dataset updated

Apr 21, 2025

Dataset provided by

Agricultural Research Servicehttps://www.ars.usda.gov/

Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Clear search

Close search

Google apps

Main menu

Data from: Current and projected research data storage needs of Agricultural...

Large and Long-Range Graph Dataset

Data from: Ab Initio Potential Energy Surface for NaCl–H2 with Correct...

Large Dataset of Generalization Patterns in the Number Game

Long-term climatic data for cities in Asia

Columns:

Big Free-Tailed Bat Range - CWHR M041 [ds1836]

Large-Blotched Ensatina Range - CWHR A012B [ds2847]

🏃🏻‍♂️ Long-distance running dataset

About

Acknowlegement

Big Brown Bat Range - CWHR M032 [ds1828]

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...

Data from: Chemical Descriptors for a Large-Scale Study on Drop-Weight...

California Giant Salamander Range - CWHR A004 [ds1133]

ECMWF Reanalysis v5

Fused Image dataset for convolutional neural Network-based crack Detection...

House Price Prediction Dataset

House Price Prediction Dataset.

1. Dataset Features

2. Feature Distributions

3. Correlation Between Features

4. Potential Use Cases

5. Limitations and ...

Twitter Conversations about the COVID-19 Omicron Variant: A Large Scale...

Comprehensive Median Household Income and Distribution Dataset for Grass...

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Goodness-of-fit filtering in classical metric multidimensional scaling with...

Twitter Dataset on the 2022 MonkeyPox Outbreak

Dataset for Grass Range, MT Census Bureau Income Distribution by Gender

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016See More Versions

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016