95 datasets found
  1. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  2. i

    Large and Long-Range Graph Dataset

    • ieee-dataport.org
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shuo wang (2025). Large and Long-Range Graph Dataset [Dataset]. https://ieee-dataport.org/documents/large-and-long-range-graph-dataset
    Explore at:
    Dataset updated
    Sep 18, 2025
    Authors
    shuo wang
    Description

    PCQM-Contact (CC BY 4.0)

  3. f

    Data from: Ab Initio Potential Energy Surface for NaClโ€“H2 with Correct...

    • acs.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyanka Pandey; Chen Qu; Apurba Nandi; Qi Yu; Paul L. Houston; Riccardo Conte; Joel M. Bowman (2024). Ab Initio Potential Energy Surface for NaClโ€“H2 with Correct Long-Range Behavior [Dataset]. http://doi.org/10.1021/acs.jpca.3c07687.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    ACS Publications
    Authors
    Priyanka Pandey; Chen Qu; Apurba Nandi; Qi Yu; Paul L. Houston; Riccardo Conte; Joel M. Bowman
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    We report a full dimensional ab initio potential energy surface for NaClโ€“H2 based on precise fitting of a large data set of CCSD(T)/aug-cc-pVTZ energies. A major goal of this fit is to describe the very long-range interaction accurately. This is done in this instance via the dipoleโ€“quadrupole interaction. The NaCl dipole and the H2 quadrupole are available through previous works over a large range of internuclear distances. We use these to obtain exact effect charges on each atom. Diffusion Monte Carlo calculations are done for the ground vibrational state using the new potential.

  4. H

    Large Dataset of Generalization Patterns in the Number Game

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric J. Bigelow; Steven T. Piantadosi (2018). Large Dataset of Generalization Patterns in the Number Game [Dataset]. http://doi.org/10.7910/DVN/A8ZWLF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Eric J. Bigelow; Steven T. Piantadosi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. โ€œeven numbers,โ€ โ€œpowers of twoโ€) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.

  5. Long-term climatic data for cities in Asia

    • kaggle.com
    zip
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahdan. M. ArioB (2024). Long-term climatic data for cities in Asia [Dataset]. https://www.kaggle.com/datasets/mohammadrahdanmofrad/long-term-climatic-data-for-cities-in-asia
    Explore at:
    zip(38203945 bytes)Available download formats
    Dataset updated
    Mar 18, 2024
    Authors
    Rahdan. M. ArioB
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    Asia
    Description

    Datasets provides long-term climate data for large Asian cities with populations over 500,000. The dataset includes data on cloud cover, temperature range, number of frost days, potential evapotranspiration, precipitation, minimum temperature, mean temperature, maximum temperature, relative humidity, and number of wet days. The dataset includes data for 831 cities.

    Columns:

    • ID
    • Date
    • Latitude
    • Longitude
    • cld: Cloud cover (%)
    • dtr: Temperature range (ยฐC)
    • frs: Number of frost days
    • pet: Potential evapotranspiration (mm)
    • pre: Precipitation (mm)
    • tmn: Minimum temperature (ยฐC)
    • tmp: Mean temperature (ยฐC)
    • tmx: Maximum temperature (ยฐC)
    • vap: Relative humidity (%)
    • wet: Number of wet days

    Inspiration:
    Are you interested in predicting the future weather conditions in your city or one of the 831 cities in our climate dataset? Our climate dataset contains data on various climate metrics, including temperature, precipitation, cloud cover, wind speed, and humidity. This data can be used to train a machine learning model that can predict future weather conditions with high accuracy. Imagine using a machine learning model to predict the weather in your city for the next week, month, or year. This information could be used to make decisions about planning, adaptation, and risk mitigation.

    Please note:
    This dataset contains satellite-derived climate data from the website https://crudata.uea.ac.uk. Satellite data are measured using sensors that may be subject to error. Therefore, it is possible that these data may differ from ground-based observations, which are typically used to generate real-world data. This difference is generally greater in remote areas and regions with high cloud.

  6. d

    Big Free-Tailed Bat Range - CWHR M041 [ds1836]

    • catalog.data.gov
    • data.cnra.ca.gov
    • +5more
    Updated Jul 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2025). Big Free-Tailed Bat Range - CWHR M041 [ds1836] [Dataset]. https://catalog.data.gov/dataset/big-free-tailed-bat-range-cwhr-m041-ds1836-b6ec5
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    California Department of Fish and Wildlife
    Description

    Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.

  7. d

    Large-Blotched Ensatina Range - CWHR A012B [ds2847]

    • catalog.data.gov
    • data.cnra.ca.gov
    • +5more
    Updated Jul 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2025). Large-Blotched Ensatina Range - CWHR A012B [ds2847] [Dataset]. https://catalog.data.gov/dataset/large-blotched-ensatina-range-cwhr-a012b-ds2847-ed46b
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    California Department of Fish and Wildlife
    Description

    Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.

  8. ๐Ÿƒ๐Ÿปโ€โ™‚๏ธ Long-distance running dataset

    • kaggle.com
    zip
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). ๐Ÿƒ๐Ÿปโ€โ™‚๏ธ Long-distance running dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/long-distance-running-dataset
    Explore at:
    zip(393989255 bytes)Available download formats
    Dataset updated
    Mar 7, 2024
    Authors
    mexwell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    About

    This dataset contains 10,703,690 records of running training during 2019 and 2020, from 36,412 athletes from around the world. The records were obtained through web scraping of a large social network for athletes on the internet.

    The data with the athletes' activities are contained in dataframe objects (tabular data) and saved in the Parquet file format using the Pandas library, part of the Python ecosystem for data science. Each Pandas dataframe contains the following data (as different columns) for each athlete (as different rows), the first word identifies the name of the column in the dataframe: - datetime: date of the running activity; - athlete: a computer-generated ID for the athlete (integer); - distance: distance of running (floating-point number, in kilometers); - duration: duration of running (floating-point number, in minutes); - gender: gender (string 'M' of 'F'); - age_group: age interval (one of the strings '18 - 34', '35 - 54', or '55 +'); - country: country of origin of the athlete (string); - major: marathon(s) and year(s) the athlete ran (comma-separated list of strings).

    For convenience, we created files with the athletes' activities data sampled at different frequencies: day 'd', week 'w', month 'm', and quarter 'q' (i.e., there are files with the distance and duration of running accumulated at each day, week, month, and quarter) for each year, 2019 and 2020. Accordingly, the files are named 'run_ww_yyyy_f.parquet', where 'yyyy' is '2019' or '2020' and 'f' is 'd', 'w', 'm' or 'q' (without quotes). The dataset also contains data with different governmentโ€™s stringency indexes for the COVID-19 pandemic. These data are saved as text files and were obtained from https://ourworldindata.org/covid-stringency-index.

    Acknowlegement

    Foto von sporlab auf Unsplash

  9. d

    Big Brown Bat Range - CWHR M032 [ds1828]

    • catalog.data.gov
    • data.ca.gov
    • +4more
    Updated Jul 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2025). Big Brown Bat Range - CWHR M032 [ds1828] [Dataset]. https://catalog.data.gov/dataset/big-brown-bat-range-cwhr-m032-ds1828-09a43
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    California Department of Fish and Wildlife
    Description

    Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.

  10. Z

    Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...

    • data.niaid.nih.gov
    Updated Aug 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6624080
    Explore at:
    Dataset updated
    Aug 10, 2022
    Dataset provided by
    University of Cincinnati
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset:

    N. Thakur, โ€œA Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,โ€ Journal of Data, vol. 7, no. 8, p. 109, Aug. 2022, doi: 10.3390/data7080109

    Abstract

    The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

    Data Description

    The dataset comprises a total of 52,984 Tweet IDs (that correspond to the same number of Tweets) about online learning that were posted on Twitter from 9th November 2021 to 13th July 2022. The earliest date was selected as 9th November 2021, as the Omicron variant was detected for the first time in a sample that was collected on this date. 13th July 2022 was the most recent date as per the time of data collection and publication of this dataset.

    The dataset consists of 9 .txt files. An overview of these dataset files along with the number of Tweet IDs and the date range of the associated tweets is as follows. Table 1 shows the list of all the synonyms or terms that were used for the dataset development.

    Filename: TweetIDs_November_2021.txt (No. of Tweet IDs: 1283, Date Range of the associated Tweet IDs: November 1, 2021 to November 30, 2021)

    Filename: TweetIDs_December_2021.txt (No. of Tweet IDs: 10545, Date Range of the associated Tweet IDs: December 1, 2021 to December 31, 2021)

    Filename: TweetIDs_January_2022.txt (No. of Tweet IDs: 23078, Date Range of the associated Tweet IDs: January 1, 2022 to January 31, 2022)

    Filename: TweetIDs_February_2022.txt (No. of Tweet IDs: 4751, Date Range of the associated Tweet IDs: February 1, 2022 to February 28, 2022)

    Filename: TweetIDs_March_2022.txt (No. of Tweet IDs: 3434, Date Range of the associated Tweet IDs: March 1, 2022 to March 31, 2022)

    Filename: TweetIDs_April_2022.txt (No. of Tweet IDs: 3355, Date Range of the associated Tweet IDs: April 1, 2022 to April 30, 2022)

    Filename: TweetIDs_May_2022.txt (No. of Tweet IDs: 3120, Date Range of the associated Tweet IDs: May 1, 2022 to May 31, 2022)

    Filename: TweetIDs_June_2022.txt (No. of Tweet IDs: 2361, Date Range of the associated Tweet IDs: June 1, 2022 to June 30, 2022)

    Filename: TweetIDs_July_2022.txt (No. of Tweet IDs: 1057, Date Range of the associated Tweet IDs: July 1, 2022 to July 13, 2022)

    The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

    Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development

    Terminology

    List of synonyms and terms

    COVID-19

    Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus

    online learning

    online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures

  11. f

    Data from: Chemical Descriptors for a Large-Scale Study on Drop-Weight...

    • acs.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank W. Marrs; Jack V. Davis; Alexandra C. Burch; Geoffrey W. Brown; Nicholas Lease; Patricia L. Huestis; Marc J. Cawkwell; Virginia W. Manner (2023). Chemical Descriptors for a Large-Scale Study on Drop-Weight Impact Sensitivity of High Explosives [Dataset]. http://doi.org/10.1021/acs.jcim.2c01154.s002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    ACS Publications
    Authors
    Frank W. Marrs; Jack V. Davis; Alexandra C. Burch; Geoffrey W. Brown; Nicholas Lease; Patricia L. Huestis; Marc J. Cawkwell; Virginia W. Manner
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The drop-weight impact test is an experiment that has been used for nearly 80 years to evaluate handling sensitivity of high explosives. Although the results of this test are known to have large statistical uncertainties, it is one of the most common tests due to its accessibility and modest material requirements. In this paper, we compile a large data set of drop-weight impact sensitivity test results (mainly performed at Los Alamos National Laboratory), along with a compendium of molecular and chemical descriptors for the explosives under test. These data consist of over 500 unique explosives, over 1000 repeat tests, and over 100 descriptors, for a total of about 1500 observations. We use random forest methods to estimate a model of explosive handling sensitivity as a function of chemical and molecular properties of the explosives under test. Our model predicts well across a wide range of explosive types, spanning a broad range of explosive performance and sensitivity. We find that properties related to explosive performance, such as heat of explosion, oxygen balance, and functional group, are highly predictive of explosive handling sensitivity. Yet, models that omit many of these properties still perform well. Our results suggest that there is not one or even several factors that explain explosive handling sensitivity, but that there are many complex, interrelated effects at play.

  12. California Giant Salamander Range - CWHR A004 [ds1133]

    • catalog.data.gov
    • data.cnra.ca.gov
    • +4more
    Updated Jul 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2025). California Giant Salamander Range - CWHR A004 [ds1133] [Dataset]. https://catalog.data.gov/dataset/california-giant-salamander-range-cwhr-a004-ds1133-ed51b
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    California Department of Fish and Wildlifehttps://wildlife.ca.gov/
    Description

    Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for California's wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.

  13. ECMWF Reanalysis v5

    • ecmwf.int
    application/x-grib
    Updated Dec 31, 1969
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Medium-Range Weather Forecasts (1969). ECMWF Reanalysis v5 [Dataset]. https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5
    Explore at:
    application/x-grib(1 datasets)Available download formats
    Dataset updated
    Dec 31, 1969
    Dataset authored and provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    land and oceanic climate variables. The data cover the Earth on a 31km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80km. ERA5 includes information about uncertainties for all variables at reduced spatial and temporal resolutions.

  14. Z

    Fused Image dataset for convolutional neural Network-based crack Detection...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shanglian Zhou; Carlos Canchila; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6383043
    Explore at:
    Dataset updated
    Apr 20, 2023
    Authors
    Shanglian Zhou; Carlos Canchila; Wei Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The โ€œFused Image dataset for convolutional neural Network-based crack Detectionโ€ (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

    The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

    If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

    In addition, an image dataset for crack classification has also been published at [6].

    References:

    [1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

    [2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

    [3] Shanglian Zhou, & Wei Song. (2020). Deep learningโ€“based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

    [4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

    5 Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

    [6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78

  15. House Price Prediction Dataset

    • kaggle.com
    zip
    Updated Sep 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zafar (2024). House Price Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/zafarali27/house-price-prediction-dataset
    Explore at:
    zip(29372 bytes)Available download formats
    Dataset updated
    Sep 21, 2024
    Authors
    Zafar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    House Price Prediction Dataset.

    The dataset contains 2000 rows of house-related data, representing various features that could influence house prices. Below, we discuss key aspects of the dataset, which include its structure, the choice of features, and potential use cases for analysis.

    1. Dataset Features

    The dataset is designed to capture essential attributes for predicting house prices, including:

    Area: Square footage of the house, which is generally one of the most important predictors of price. Bedrooms & Bathrooms: The number of rooms in a house significantly affects its value. Homes with more rooms tend to be priced higher. Floors: The number of floors in a house could indicate a larger, more luxurious home, potentially raising its price. Year Built: The age of the house can affect its condition and value. Newly built houses are generally more expensive than older ones. Location: Houses in desirable locations such as downtown or urban areas tend to be priced higher than those in suburban or rural areas. Condition: The current condition of the house is critical, as well-maintained houses (in 'Excellent' or 'Good' condition) will attract higher prices compared to houses in 'Fair' or 'Poor' condition. Garage: Availability of a garage can increase the price due to added convenience and space. Price: The target variable, representing the sale price of the house, used to train machine learning models to predict house prices based on the other features.

    2. Feature Distributions

    Area Distribution: The area of the houses in the dataset ranges from 500 to 5000 square feet, which allows analysis across different types of homes, from smaller apartments to larger luxury houses. Bedrooms and Bathrooms: The number of bedrooms varies from 1 to 5, and bathrooms from 1 to 4. This variance enables analysis of homes with different sizes and layouts. Floors: Houses in the dataset have between 1 and 3 floors. This feature could be useful for identifying the influence of multi-level homes on house prices. Year Built: The dataset contains houses built from 1900 to 2023, giving a wide range of house ages to analyze the effects of new vs. older construction. Location: There is a mix of urban, suburban, downtown, and rural locations. Urban and downtown homes may command higher prices due to proximity to amenities. Condition: Houses are labeled as 'Excellent', 'Good', 'Fair', or 'Poor'. This feature helps model the price differences based on the current state of the house. Price Distribution: Prices range between $50,000 and $1,000,000, offering a broad spectrum of property values. This range makes the dataset appropriate for predicting a wide variety of housing prices, from affordable homes to luxury properties.

    3. Correlation Between Features

    A key area of interest is the relationship between various features and house price: Area and Price: Typically, a strong positive correlation is expected between the size of the house (Area) and its price. Larger homes are likely to be more expensive. Location and Price: Location is another major factor. Houses in urban or downtown areas may show a higher price on average compared to suburban and rural locations. Condition and Price: The condition of the house should show a positive correlation with price. Houses in better condition should be priced higher, as they require less maintenance and repair. Year Built and Price: Newer houses might command a higher price due to better construction standards, modern amenities, and less wear-and-tear, but some older homes in good condition may retain historical value. Garage and Price: A house with a garage may be more expensive than one without, as it provides extra storage or parking space.

    4. Potential Use Cases

    The dataset is well-suited for various machine learning and data analysis applications, including:

    House Price Prediction: Using regression techniques, this dataset can be used to build a model to predict house prices based on the available features. Feature Importance Analysis: By using techniques such as feature importance ranking, data scientists can determine which features (e.g., location, area, or condition) have the greatest impact on house prices. Clustering: Clustering techniques like k-means could help identify patterns in the data, such as grouping houses into segments based on their characteristics (e.g., luxury homes, affordable homes). Market Segmentation: The dataset can be used to perform segmentation by location, price range, or house type to analyze trends in specific sub-markets, like luxury vs. affordable housing. Time-Based Analysis: By studying how house prices vary with the year built or the age of the house, analysts can derive insights into the trends of older vs. newer homes.

    5. Limitations and ...

  16. Twitter Conversations about the COVID-19 Omicron Variant: A Large Scale...

    • zenodo.org
    • dataverse.harvard.edu
    • +1more
    txt
    Updated Jul 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur; Nirmalya Thakur (2022). Twitter Conversations about the COVID-19 Omicron Variant: A Large Scale Dataset of more than 500,000 Tweets [Dataset]. http://doi.org/10.5281/zenodo.6893676
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 25, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nirmalya Thakur; Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset:

    N. Thakur and C.Y. Han, โ€œAn Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection,โ€ Journal of COVID, 2022, Volume 5, Issue 3, pp. 1026-1049

    Abstract

    This open-access dataset is one of the salient contributions of the above-mentioned paper. It presents a total of 522,886 Tweet IDs of the same number of Tweets about the SARS-CoV-2 Omicron Variant posted on Twitter since the first detected case of this variant on November 24, 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

    Data Description

    The Tweet IDs are presented in 7 different .txt files based on the timelines of the associated tweets. The data collection followed a keyword-based approach and tweets comprising the "omicron" keyword were filtered, collected, and added to this dataset. The following is the description of these dataset files.

    • Filename: TweetIDs_November.txt (No. of Tweet IDs: 16471, Date Range of the Tweet IDs: November 24, 2021 to November 30, 2021)
    • Filename: TweetIDs_December.txt (No. of Tweet IDs: 99288, Date Range of the Tweet IDs: December 1, 2021 to December 31, 2021)
    • Filename: TweetIDs_January.txt (No. of Tweet IDs: 92860, Date Range of the Tweet IDs: January 1, 2022 to January 31, 2022)
    • Filename: TweetIDs_February.txt (No. of Tweet IDs: 89080, Date Range of the Tweet IDs: February 1, 2022 to February 28, 2022)
    • Filename: TweetIDs_March.txt (No. of Tweet IDs: 97844, Date Range of the Tweet IDs: March 1, 2022 to March 31, 2022)
    • Filename: TweetIDs_April.txt (No. of Tweet IDs: 91587, Date Range of the Tweet IDs: April 1, 2022 to April 20, 2022)
    • Filename: TweetIDs_May.txt (No. of Tweet IDs: 35756, Date Range of the Tweet IDs: May 1, 2022 to May 12, 2022)

    In the above table, the last date for May is May 12 as it was the most recent date at the time of data collection and dataset upload. The dataset would be updated soon to incorporate more recent tweets.

    The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

  17. N

    Comprehensive Median Household Income and Distribution Dataset for Grass...

    • neilsberg.com
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Comprehensive Median Household Income and Distribution Dataset for Grass Range, MT: Analysis by Household Type, Size and Income Brackets [Dataset]. https://www.neilsberg.com/research/datasets/cd9e83ad-b041-11ee-aaca-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Montana, Grass Range
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the median household income in Grass Range. It can be utilized to understand the trend in median household income and to analyze the income distribution in Grass Range by household type, size, and across various income brackets.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Grass Range, MT Median Household Income Trends (2010-2021, in 2022 inflation-adjusted dollars)
    • Median Household Income Variation by Family Size in Grass Range, MT: Comparative analysis across 7 household sizes
    • Income Distribution by Quintile: Mean Household Income in Grass Range, MT
    • Grass Range, MT households by income brackets: family, non-family, and total, in 2022 inflation-adjusted dollars

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Grass Range median household income. You can refer the same here

  18. Goodness-of-fit filtering in classical metric multidimensional scaling with...

    • tandf.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Graffelman (2023). Goodness-of-fit filtering in classical metric multidimensional scaling with large datasets [Dataset]. http://doi.org/10.6084/m9.figshare.11389830.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Jan Graffelman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Metric multidimensional scaling (MDS) is a widely used multivariate method with applications in almost all scientific disciplines. Eigenvalues obtained in the analysis are usually reported in order to calculate the overall goodness-of-fit of the distance matrix. In this paper, we refine MDS goodness-of-fit calculations, proposing additional point and pairwise goodness-of-fit statistics that can be used to filter poorly represented observations in MDS maps. The proposed statistics are especially relevant for large data sets that contain outliers, with typically many poorly fitted observations, and are helpful for improving MDS output and emphasizing the most important features of the dataset. Several goodness-of-fit statistics are considered, and both Euclidean and non-Euclidean distance matrices are considered. Some examples with data from demographic, genetic and geographic studies are shown.

  19. Twitter Dataset on the 2022 MonkeyPox Outbreak

    • kaggle.com
    zip
    Updated Nov 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur, PhD (2022). Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. https://www.kaggle.com/datasets/thakurnirmalya/monkeypox2022tweets
    Explore at:
    zip(4397490 bytes)Available download formats
    Dataset updated
    Nov 16, 2022
    Authors
    Nirmalya Thakur, PhD
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ๐๐ฅ๐ž๐š๐ฌ๐ž ๐œ๐ข๐ญ๐ž ๐ญ๐ก๐ž ๐Ÿ๐จ๐ฅ๐ฅ๐จ๐ฐ๐ข๐ง๐  ๐ฉ๐š๐ฉ๐ž๐ซ ๐ฐ๐ก๐ž๐ง ๐ฎ๐ฌ๐ข๐ง๐  ๐ญ๐ก๐ข๐ฌ ๐๐š๐ญ๐š๐ฌ๐ž๐ญ: N. Thakur, โ€œMonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions,โ€ Infect. Dis. Rep., vol. 14, no. 6, pp. 855โ€“883, 2022, DOI: https://doi.org/10.3390/idr14060087

    ๐€๐›๐ฌ๐ญ๐ซ๐š๐œ๐ญ The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Therefore, this work presents an open-access dataset of ๐Ÿ“๐Ÿ•๐Ÿ,๐Ÿ–๐Ÿ‘๐Ÿ ๐“๐ฐ๐ž๐ž๐ญ๐ฌ about monkeypox that have been posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset complies with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

    ๐ƒ๐š๐ญ๐š ๐ƒ๐ž๐ฌ๐œ๐ซ๐ข๐ฉ๐ญ๐ข๐จ๐ง The dataset consists of a total of ๐Ÿ“๐Ÿ•๐Ÿ,๐Ÿ–๐Ÿ‘๐Ÿ ๐“๐ฐ๐ž๐ž๐ญ ๐ˆ๐ƒ๐ฌ of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 11th November (the most recent date at the time of uploading the most recent version of the dataset). The Tweet IDs are presented in 12 different .txt files based on the timelines of the associated tweets. The following represents the details of these dataset files.

    • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the associated Tweet IDs: May 7, 2022, to May 21, 2022)
    • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the associated Tweet IDs: May 21, 2022, to May 27, 2022)
    • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the associated Tweet IDs: May 27, 2022, to June 5, 2022)
    • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the associated Tweet IDs: June 5, 2022, to June 11, 2022)
    • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 46718, Date Range of the associated Tweet IDs: June 12, 2022, to June 30, 2022)
    • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the associated Tweet IDs: July 1, 2022, to July 23, 2022)
    • Filename: TweetIDs_Part7.txt (No. of Tweet IDs: 105890, Date Range of the associated Tweet IDs: July 24, 2022, to July 31, 2022)
    • Filename: TweetIDs_Part8.txt (No. of Tweet IDs: 93959, Date Range of the associated Tweet IDs: August 1, 2022, to August 9, 2022)
    • Filename: TweetIDs_Part9.txt (No. of Tweet IDs: 50832, Date Range of the associated Tweet IDs: August 10, 2022, to August 24, 2022)
    • Filename: TweetIDs_Part10.txt (No. of Tweet IDs: 39042, Date Range of the associated Tweet IDs: August 25, 2022, to September 19, 2022)
    • Filename: TweetIDs_Part11.txt (No. of Tweet IDs: 12341, Date Range of the associated Tweet IDs: September 20, 2022, to October 9, 2022)
    • Filename: TweetIDs_Part12.txt (No. of Tweet IDs: 15404, Date Range of the associated Tweet IDs: October 10, 2022, to November 11, 2022)

    Please note: The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

  20. N

    Dataset for Grass Range, MT Census Bureau Income Distribution by Gender

    • neilsberg.com
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Dataset for Grass Range, MT Census Bureau Income Distribution by Gender [Dataset]. https://www.neilsberg.com/research/datasets/b3b45159-abcb-11ee-8b96-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 9, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Montana, Grass Range
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Grass Range household income by gender. The dataset can be utilized to understand the gender-based income distribution of Grass Range income.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Grass Range, MT annual median income by work experience and sex dataset : Aged 15+, 2010-2022 (in 2022 inflation-adjusted dollars)
    • Grass Range, MT annual income distribution by work experience and gender dataset (Number of individuals ages 15+ with income, 2021)

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Grass Range income distribution by gender. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Organization logo

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Related Article
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Search
Clear search
Close search
Google apps
Main menu