100+ datasets found

Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
q
Figure 2 raw data
data.researchdatafinder.qut.edu.au
Updated Apr 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Figure 2 raw data [Dataset]. https://data.researchdatafinder.qut.edu.au/dataset/a-method-for6/resource/3c144d68-d2da-4dc9-a494-2902a2c402eb
Explore at:
Dataset updated
Apr 20, 2024
License
http://researchdatafinder.qut.edu.au/display/n16172http://researchdatafinder.qut.edu.au/display/n16172
Description
Figure 2 raw data QUT Research Data Respository Dataset Resource available for download
f
Public Availability of Published Research Data in High-Impact Journals
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alawi A. Alsheikh-Ali; Waqas Qureshi; Mouaz H. Al-Mallah; John P. A. Ioannidis (2023). Public Availability of Published Research Data in High-Impact Journals [Dataset]. http://doi.org/10.1371/journal.pone.0024357
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0024357
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Alawi A. Alsheikh-Ali; Waqas Qureshi; Mouaz H. Al-Mallah; John P. A. Ioannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThere is increasing interest to make primary data from published research publicly available. We aimed to assess the current status of making research data available in highly-cited journals across the scientific literature. Methods and ResultsWe reviewed the first 10 original research papers of 2009 published in the 50 original research journals with the highest impact factor. For each journal we documented the policies related to public availability and sharing of data. Of the 50 journals, 44 (88%) had a statement in their instructions to authors related to public availability and sharing of data. However, there was wide variation in journal requirements, ranging from requiring the sharing of all primary data related to the research to just including a statement in the published manuscript that data can be available on request. Of the 500 assessed papers, 149 (30%) were not subject to any data availability policy. Of the remaining 351 papers that were covered by some data availability policy, 208 papers (59%) did not fully adhere to the data availability instructions of the journals they were published in, most commonly (73%) by not publicly depositing microarray data. The other 143 papers that adhered to the data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data. Overall, only 47 papers (9%) deposited full primary raw data online. None of the 149 papers not subject to data availability policies made their full primary data publicly available. ConclusionA substantial proportion of original research papers published in high-impact journals are either not subject to any data availability policies, or do not adhere to the data availability instructions in their respective journals. This empiric evaluation highlights opportunities for improvement.
Survey data of "Mapping Research Output to the Sustainable Development Goals...
zenodo.org
bin, pdf, zip
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maurice Vanderfeesten; Maurice Vanderfeesten; Eike Spielberg; Eike Spielberg; Yassin Gunes; Yassin Gunes (2024). Survey data of "Mapping Research Output to the Sustainable Development Goals (SDGs)" [Dataset]. http://doi.org/10.5281/zenodo.3813230
Explore at:
bin, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3813230
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maurice Vanderfeesten; Maurice Vanderfeesten; Eike Spielberg; Eike Spielberg; Yassin Gunes; Yassin Gunes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains information on what papers and concepts researchers find relevant to map domain specific research output to the 17 Sustainable Development Goals (SDGs).

Sustainable Development Goals are the 17 global challenges set by the United Nations. Within each of the goals specific targets and indicators are mentioned to monitor the progress of reaching those goals by 2030. In an effort to capture how research is contributing to move the needle on those challenges, we earlier have made an initial classification model than enables to quickly identify what research output is related to what SDG. (This Aurora SDG dashboard is the initial outcome as proof of practice.)

In order to validate our current classification model (on soundness/precision and completeness/recall), and receive input for improvement, a survey has been conducted to capture expert knowledge from senior researchers in their research domain related to the SDG. The survey was open to the world, but mainly distributed to researchers from the Aurora Universities Network. The survey was open from October 2019 till January 2020, and captured data from 244 respondents in Europe and North America.

17 surveys were created from a single template, where the content was made specific for each SDG. Content, like a random set of publications, of each survey was ingested by a data provisioning server. That collected research output metadata for each SDG in an earlier stage. It took on average 1 hour for a respondent to complete the survey. The outcome of the survey data can be used for validating current and optimizing future SDG classification models for mapping research output to the SDGs.

The survey contains the following questions (see inside dataset for exact wording):

Are you familiar with this SDG?

Respondents could only proceed if they were familiar with the targets and indicators of this SDG. Goal of this question was to weed out un knowledgeable respondents and to increase the quality of the survey data.

Suggest research papers that are relevant for this SDG (upload list)

This question, to provide a list, was put first to reduce influenced by the other questions. Goal of this question was to measure the completeness/recall of the papers in the result set of our current classification model. (To lower the bar, these lists could be provided by either uploading a file from a reference manager (preferred) in .ris of bibtex format, or by a list of titles. This heterogenous input was processed further on by hand into a uniform format.)

Select research papers that are relevant for this SDG (radio buttons: accept, reject)

A randomly selected set of 100 papers was injected in the survey, out of the full list of thousands of papers in the result set of our current classification model. Goal of this question was to measure the soundness/precision of our current classification model.

Select and Suggest Keywords related to SDG (checkboxes: accept | text field: suggestions)

The survey was injected with the top 100 most frequent keywords that appeared in the metadata of the papers in the result set of the current classification model. respondents could select relevant keywords we found, and add ones in a blank text field. Goal of this question was to get suggestions for keywords we can use to increase the recall of relevant papers in a new classification model.

Suggest SDG related glossaries with relevant keywords (text fields: url)

Open text field to add URL to lists with hundreds of relevant keywords related to this SDG. Goal of this question was to get suggestions for keywords we can use to increase the recall of relevant papers in a new classification model.

Select and Suggest Journals fully related to SDG (checkboxes: accept | text field: suggestions)

The survey was injected with the top 100 most frequent journals that appeared in the metadata of the papers in the result set of the current classification model. Respondents could select relevant journals we found, and add ones in a blank text field. Goal of this question was to get suggestions for complete journals we can use to increase the recall of relevant papers in a new classification model.

Suggest improvements for the current queries (text field: suggestions per target)

We showed respondents the queries we used in our current classification model next to each of the targets within the goal. Open text fields were presented to change, add, re-order, delete something (keywords, boolean operators, etc. ) in the query to improve it in their opinion. Goal of this question was to get suggestions we can use to increase the recall and precision of relevant papers in a new classification model.

In the dataset root you'll find the following folders and files:

/00-survey-input/

This contains the survey questions for all the individual SDGs. It also contains lists of EIDs categorised to the SDGs we used to make randomized selections from to present to the respondents.

/01-raw-data/

This contains the raw survey output. (Excluding privacy sensitive information for public release.) This data needs to be combined with the data on the provisioning server to make sense.

/02-aggregated-data/

This data is where individual responses are aggregated. Also the survey data is combined with the provisioning server, of all sdg surveys combined, responses are aggregated, and split per question type.

/03-scripts/

This contains scripts to split data, and to add descriptive metadata for text analysis in a later stage.

/04-processed-data/

This is the main final result that can be used for further analysis. Data is split by SDG into subdirectories, in there you'll find files per question type containing the aggregated data of the respondents.

/images/

images of the results used in this README.md.

LICENSE.md

terms and conditions for reusing this data.

README.md

description of the dataset; each subfolders contains a README.md file to futher describe the content of each sub-folder.

In the /04-processed-data/ you'll find in each SDG sub-folder the following files.:

SDG-survey-questions.pdf

This file contains the survey questions

</li> <li>SDG-survey-questions.doc <ul> <li>This file contains the survey questions</li> </ul> </li> <li>SDG-survey-respondents-per-sdg.csv <ul> <li>Basic information about the survey and responses</li> </ul> </li> <li>SDG-survey-city-heatmap.csv <ul> <li>Origin of the respondents per SDG survey</li> </ul> </li> <li>SDG-survey-suggested-publications.txt <ul> <li>Formatted list of research papers researchers have uploaded or listed they want to see back in the result-set for this SDG.</li> </ul> </li> <li>SDG-survey-suggested-publications-with-eid-match.csv <ul> <li>same as above, only matched with an EID. EIDs are matched my Elsevier's internal fuzzy matching algorithm. Only papers with high confidence are show with a match of an EID, referring to a record in Scopus.</li> </ul> </li> <li>SDG-survey-selected-publications-accepted.csv <ul> <li>Based on our previous result set of papers, researchers were presented random samples, they selected papers they believe represent this SDG. (TRUE=accepted)</li> </ul> </li> <li>SDG-survey-selected-publications-rejected.csv <ul> <li>Based on our previous result set of papers, researchers were presented random samples, they selected papers they believe not to represent this SDG. (FALSE=rejected)</li> </ul> </li> <li>SDG-survey-selected-keywords.csv <ul> <li>Based on our previous result set of papers, we presented researchers the keywords that are in the metadata of those papers, they selected keywords they believe represent this SDG.</li> </ul> </li> <li>SDG-survey-unselected-keywords.csv <ul> <li>As "selected-keywords", this is the list of keywords that respondents have not selected to represent this SDG.</li> </ul> </li> <li>SDG-survey-suggested-keywords.csv <ul> <li>List of keywords researchers suggest to use to find papers related to this SDG</li> </ul> </li> <li>SDG-survey-glossaries.csv <ul> <li>List of glossaries, containing keywords, researchers suggest to use to find papers related to this SDG</li> </ul> </li> <li>SDG-survey-selected-journals.csv <ul> <li>Based on our previous result set of papers, we presented researchers the journals that are in the metadata of those papers, they selected journals they believe represent this SDG.</li> </ul> </li> <li>SDG-survey-unselected-journals.csv <ul> <li>As "selected-journals", this is the list of journals
d
TagX Data collection for AI/ ML training | LLM data | Data collection for AI...
datarade.ai
.json, .csv, .xls
Updated Jun 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TagX (2021). TagX Data collection for AI/ ML training | LLM data | Data collection for AI development & model finetuning | Text, image, audio, and document data [Dataset]. https://datarade.ai/data-products/data-collection-and-capture-services-tagx
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Jun 18, 2021
Dataset authored and provided by
TagX
Area covered
Russian Federation, Belize, Benin, Djibouti, Equatorial Guinea, Qatar, Iceland, Saudi Arabia, Antigua and Barbuda, Colombia
Description
We offer comprehensive data collection services that cater to a wide range of industries and applications. Whether you require image, audio, or text data, we have the expertise and resources to collect and deliver high-quality data that meets your specific requirements. Our data collection methods include manual collection, web scraping, and other automated techniques that ensure accuracy and completeness of data.

Our team of experienced data collectors and quality assurance professionals ensure that the data is collected and processed according to the highest standards of quality. We also take great care to ensure that the data we collect is relevant and applicable to your use case. This means that you can rely on us to provide you with clean and useful data that can be used to train machine learning models, improve business processes, or conduct research.

We are committed to delivering data in the format that you require. Whether you need raw data or a processed dataset, we can deliver the data in your preferred format, including CSV, JSON, or XML. We understand that every project is unique, and we work closely with our clients to ensure that we deliver the data that meets their specific needs. So if you need reliable data collection services for your next project, look no further than us.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
f
Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw...
plos.figshare.com
ai
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heather A. Piwowar (2023). Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data [Dataset]. http://doi.org/10.1371/journal.pone.0018657
Explore at:
aiAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0018657
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Heather A. Piwowar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%–35% in 2007–2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.
f
Health Research evidence-raw Data.sav
datasetcatalog.nlm.nih.gov
Updated Jul 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mongi, Richard John; Kagoma, Pius; Kalolo, Albino (2024). Health Research evidence-raw Data.sav [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001354774
Explore at:
Dataset updated
Jul 23, 2024
Authors
Mongi, Richard John; Kagoma, Pius; Kalolo, Albino
Description
This research intended to analyze the current usage of health research evidence in health planning, determinants, and readiness to use knowledge translation tools among planning teams in Tanzania. Specifically, the study aims to 1) analyze the current usage of health research evidence among planning team members at the regional and council levels, 2) analyze the capability for the use of health research evidence among planning team members at regional and council levels, 3) analyze the opportunities for the use of health research evidence among health planning members at regional and council levels, 4) to identify the motivations for the use of health research evidence among health planning team members at regional and council levels, and 5) to assess the readiness of the planning team members on the use of knowledge translation tools. The study employed an exploratory mixed-method study design. It was conducted in nine (9) regions and eighteen (18) Councils of Tanzania Mainland involving the health planning team members.
RAW Data Systematic Review.xlsx
figshare.com
xlsx
Updated Feb 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Lange-Drenth (2019). RAW Data Systematic Review.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7701014.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7701014.v1
Dataset updated
Feb 11, 2019
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Lukas Lange-Drenth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The search of databases resulted in 7314 citations
q
Figure 1 raw data
data.researchdatafinder.qut.edu.au
Updated Jul 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Figure 1 raw data [Dataset]. https://data.researchdatafinder.qut.edu.au/dataset/a-method-for6/resource/dad525d4-ac75-4479-a62e-953117053772
Explore at:
Dataset updated
Jul 6, 2022
License
http://researchdatafinder.qut.edu.au/display/n16172http://researchdatafinder.qut.edu.au/display/n16172
Description
Figure 1 raw data QUT Research Data Respository Dataset Resource available for download
o
Raw Data Logistics
explore.openaire.eu
dataverse.harvard.edu
Updated Jan 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waiphot Kulachai (2024). Raw Data Logistics [Dataset]. http://doi.org/10.7910/dvn/dclhah
Explore at:
Unique identifier
https://doi.org/10.7910/dvn/dclhah
Dataset updated
Jan 1, 2024
Authors
Waiphot Kulachai
Description
Raw data of the research.
d
Dataset with determinants or factors influencing graduate economics student...
search.dataone.org
data.niaid.nih.gov
+2more
Updated Nov 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zurika Robinson; Thea Uys (2023). Dataset with determinants or factors influencing graduate economics student preparation and success in an online environment [Dataset]. http://doi.org/10.5061/dryad.bvq83bkgd
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.bvq83bkgd
Dataset updated
Nov 3, 2023
Dataset provided by
Dryad Digital Repository
Authors
Zurika Robinson; Thea Uys
Time period covered
Jan 1, 2023
Description
The data relates to the paper that analyses the determinants or factors that best explain student research skills and success in the honours research report module during the COVID-19 pandemic in 2021. The data used have been gathered through an online survey created on the Qualtrics software package. The research questions were developed from demographic factors and subject knowledge including assignments to supervisor influence and other factors in terms of experience or belonging that played a role (see anonymous link atÂ https://unisa.qualtrics.com/jfe/form/SV_86OZZOdyA5sBurY. An SMS was sent to all students of the 2021 module group to make them aware of the survey. They were under no obligation to complete it and all information was regarded as anonymous. We received 39 responses. The raw data from the survey was processed through the SPSS statistical, software package. The data file contains the demographics, frequencies, descriptives, and open questions processed. Â Â Â Â The study...
i
Household Health Survey 2012-2013, Economic Research Forum (ERF)...
datacatalog.ihsn.org
catalog.ihsn.org
Updated Jun 26, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kurdistan Regional Statistics Office (KRSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://datacatalog.ihsn.org/catalog/6937
Explore at:
Dataset updated
Jun 26, 2017
Dataset provided by
Kurdistan Regional Statistics Office (KRSO)
Central Statistical Organization (CSO)
Economic Research Forum
Time period covered
2012 - 2013
Area covered
Iraq
Description
Abstract

The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

The survey has six main objectives. These objectives are:

Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.

Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.

Provide data that meet the needs and requirements of national accounts.

Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.

Provide detailed indicators on the sources of households and individuals income.

Provide data necessary for formulation of a new consumer price index number.

The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

Geographic coverage

National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

Kind of data

Sample survey data [ssd]

Sampling procedure

----> Design:

Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

----> Sample frame:

Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

----> Sampling Stages:

In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

Mode of data collection

Face-to-face [f2f]

Research instrument

----> Preparation:

The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

----> Questionnaire Parts:

The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

Cleaning operations

----> Raw Data:

Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

----> Harmonized Data:

The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.

The harmonization process starts with raw data files received from the Statistical Office.

A program is generated for each dataset to create harmonized variables.

Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
f
Raw data of Articles that were produced from PubMed search using the...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Webb, Jason (2023). Raw data of Articles that were produced from PubMed search using the included search string within the Journals listed in the Google metrics. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000964710
Explore at:
Dataset updated
Nov 9, 2023
Authors
Webb, Jason
Description
Stigmatizing language or non-person-centered language (PCL) has been shown to impact patients negatively, especially in the case of obesity. This has led many associations, such as the American Medical Association (AMA) and the International Committee of Medical Journal Editors (ICMJE) to enact guidelines prohibiting the use of stigmatizing language in medical research. In 2018, the AMA adopted PCL guidelines, including a specific obesity amendment that all researchers should adhere to. Our primary objective was to determine if PCL guidelines specific to obesity have been properly obeyed in the most interacted with sports medicine journals. We searched within PubMed for obesity-related articles between 2019 and 2022 published in the top ten most interacted sports medicine journals based on Google Metrics data. A predetermined list of stigmatizing and non-PCL terms/language was searched within each article.
e
Raw data, R scripts and R datasets for statistical analyses from the...
b2find.eudat.eu
researchdata.tuwien.ac.at
Updated Nov 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Raw data, R scripts and R datasets for statistical analyses from the research article 'Advancing Glycyrrhiza glabra L. cultivation and hairy root transformation and elicitation for future metabolite overexpression' [Dataset]. https://b2find.eudat.eu/dataset/e72e47cf-8a10-590d-99fb-843a87ce24cb
Explore at:
Dataset updated
Nov 1, 2024
Description
This dataset was created during the research carrried out for the PhD of Negin Afsharzadeh and the subsequent manuscript arising from this research. The main purpose of this dataset is to create a record of the raw data that was used in the analyses in the manuscript. This dataset includes: raw data generated from experiments stored in an Excel spreadsheet with each sheet corresponding to a specific experiment or part of an experiment (Afsharzadeh_et_al_2024.xlsx) R script used to analyse the raw data in the software, R (Afsharzadeh_et_al.R) datasets that were used to analyse the data in the statistical software, R (germindata.txt, light.txt) Context and methodology Brief description of experiments: In this study, we aimed to optimize approaches to improve the biotechnological production of important metabolites in G. glabra. The study is made up of four experiments that correspond to particular figures/tables in the manuscript and data, as described below. Experiment 1: We tested approaches for the cultivation of G. glabra, specifically the breaking of seed dormancy, to ensure timely and efficient seed germination. To do this, we tested the effect of different pretreatments, sterilization treatments and growth media on the germination success of G. glabra. This experiment corresponds to: Manuscript: Table 1 and Figure 1 Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Table_1'); Afsharzadeh_et_al.R; germindata.txt Experiment 2 (Table 2): We aimed to optimize the induction of hairy roots in G. glabra. Four strains of R. rhizogenes were tested to identify the most effective strain for inducing hairy root formation and we tested different tissue explants (cotyledons/hypocotyls) and methods of R. rhizogenes infection (injection or soaking for different durations) in these tissues. This experiment corresponds to: Manuscript: Table 2 Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Table_2') Experiment 3 (Figure 2): Eight distinct hairy root lines were established and the growth rate of these lines was measured over 40 days. This experiment corresponds to: Manuscript: Figure 2, Table S2 Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Figure_2') Experiment 4 (Figure 3): We aimed to test different qualities of light on hairy root cultures in order to induce higher growth and possible enhanced metabolite production. A line with a high growth rate from experiment 3, line S, was selected for growth under different light treatments: red light, blue light, and a combination of blue and red light. To assess the overall impact of these treatments, the growth of line S, as well as the increase in antioxidant capacity and total phenolic content, were tracked over this induction period. This experiment corresponds to: Manuscript: Figure 3, Figure S4 Data: Afsharzadeh_et_al_2024.xlsx (Sheets 'Figure_3_FW', 'Figure_3_FRAP', 'Figure_3_Phenol'); Afsharzadeh_et_al.R; light.txt Technical details To work with the .R file and the R datasets, it is necessary to use R: A Language and Environment for Statistical Computing and a package within R, aDHARMA. The versions used for the analyses are R version 4.4.1 and aDHARMA version 0.4.6. The references for these are: R Core Team, R: A Language and Environment for Statistical Computing 2024. https://www.R-project.org/ Hartig F, DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models 2022. https://CRAN.R-project.org/package=DHARMa
Survey on data access conditions for sensitive data - Raw Data and...
zenodo.org
bin, pdf
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricarda Braukmann; Ricarda Braukmann; Deborah Thorpe; Deborah Thorpe (2024). Survey on data access conditions for sensitive data - Raw Data and Documentation [Dataset]. http://doi.org/10.5281/zenodo.12805137
Explore at:
pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12805137
Dataset updated
Dec 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ricarda Braukmann; Ricarda Braukmann; Deborah Thorpe; Deborah Thorpe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 10, 2024 - Jul 1, 2024
Description
This Upload concerns the survey entitled “Survey on data access conditions for sensitive data” which was performed by The Dutch Open Data Infrastructure for Social Science and Economic Innovations (ODISSEI) and DANS, the Dutch national centre of expertise and repository for research data. The survey was launched in April 2024 and open until July 1st 2024.

The goal of the survey was to gather additional information about access conditions and restrictions used by researchers using the DANS and ODISSEI services. The results of the survey can guide us to identify common conditions that should be included in a standardisation effort. Moreover, the results will improve the available guidance that DANS and ODISSEI can provide for their local research community. A comprehensive analysis of the survey and resulting recommendations will be published separately at a later point in time.

This upload contains:

a PDF with documentation of the survey, specifically the description and question text.

an ODS spreadsheet with the raw data from the survey which was filled in by 46 participants. Please note that the survey was anonymous, no personal data of the respondents was collected.

The survey was executed using EUSurvey (v1.5.3.1). EU Survey is open source and built by DG DIGIT and funded under the ISA, ISA2 and Digital Europe Programme (DIGITAL). EUSurvey is published under the EUPL licence and the source code is available from GitHub: https://github.com/EUSurvey.
D
Growing Roots: Connecting Elderly through Virtual Nature Spaces
lifesciences.datastations.nl
Updated Nov 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. van Houwelingen-Snippe; J. van Houwelingen-Snippe (2021). Growing Roots: Connecting Elderly through Virtual Nature Spaces [Dataset]. http://doi.org/10.17026/DANS-XHF-D2J5
Explore at:
application/x-spss-syntax(1392), application/x-spss-syntax(1166), pdf(112985), pdf(88808), pdf(85554), pdf(115376), application/x-spss-syntax(1871), pdf(124944), pdf(149620), pdf(118825), pdf(104079), text/x-fixed-field(2546), pdf(125125), pdf(112796), pdf(139122), pdf(105501), text/x-fixed-field(5376), pdf(115989), pdf(156402), pdf(97279), pdf(118664), pdf(131267), text/x-fixed-field(34030), pdf(130032), pdf(104344), application/x-spss-syntax(2356), pdf(355698), pdf(184151), pdf(110738), pdf(108964), text/x-fixed-field(63759), pdf(131882), rar(70246), pdf(147599), pdf(96667), pdf(109741), pdf(157367), pdf(108087), pdf(119244), pdf(292634), pdf(106166), pdf(106327), pdf(106751), pdf(365405), pdf(107570), pdf(114366), pdf(92390), pdf(91227), pdf(291838), rar(757895), pdf(89186), zip(46752), txt(1617), pdf(107854), pdf(112299), tsv(102805), tsv(26464), tsv(6356), tsv(3902), tsv(6044), tsv(3988), tsv(27657)Available download formats
Unique identifier
https://doi.org/10.17026/DANS-XHF-D2J5
Dataset updated
Nov 23, 2021
Dataset provided by
DANS Data Station Life Sciences
Authors
J. van Houwelingen-Snippe; J. van Houwelingen-Snippe
License
https://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
Description
This dataset contains all data from the Growing Roots Project, a create health project funded by ZonMW.Within this folder you find raw data aquired in the Growing Roots project. - DataLaboratoryExperiment.sav is the SPSS file with the raw data of the Laboratory Experiment which is published: van Houwelingen-Snippe, J., van Rompay, T. J., de Jong, M. D., & Ben Allouch, S. (2020). Does digital nature enhance social aspirations? An experimental study. International journal of environmental research and public health, 17(4), 1454. - DataQuantitativeStudyMTurk55+.sav is the SPSS file with the raw data of the survey study for adults aged 55 years or older. The article written on this data has been accepted for publication in Journal of Ageing & Society. - DataSurveyStudyCovid19.sav is the SPSS file with the raw data of the survey study that has been conducted during the first lockdown of Covid 19 and has been published: van Houwelingen-Snippe, J., van Rompay, T. J., & Ben Allouch, S. (2020). Feeling connected after experiencing digital nature: A survey study. International journal of environmental research and public health, 17(18), 6879. - Interview study 2021 transcripts Dutch.rar contains all anonymized transcripts of the interview study that has been conducted in 2021 amongst older adults. - QuantitativeDataInterviewStudy2021.sav contains all raw quantitative data collected during the interview study in 2021. - Transcripts Focus Groups.rar contains all anonymized transcripts of the focus groups. The article written on this data has been accepted for publication in Journal of Ageing & Society. Date: 2021-11-22 Date Submitted: 2021-11-22
d
Raw data of graduates' research
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kusmiati, Mia (2023). Raw data of graduates' research [Dataset]. http://doi.org/10.7910/DVN/ZMZORC
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ZMZORC
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Kusmiati, Mia
Description
This data contains raw data from alumni responses 0f 206 respondents to a questionnaire survey which were conducted over three months in 2022. The research regarding The Most Favorable Aspect of Achievement The Medical Competence Regarding Patient Management Ability. Some variables observed are medical competence, professional behavior, interpersonal skill, and patient management ability.
d
Data from: The availability of research data declines rapidly with article...
search.dataone.org
borealisdata.ca
+1more
Updated Mar 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vines, Timothy H.; Albert, Arianne Y. K.; Andrew, Rose L.; Débarre, Florence; Bock, Dan G.; Franklin, Michelle T.; Gilbert, Kimberly J.; Moore, Jean-Sébastien; Renaut, Sébastien; Rennison, Diana J. (2024). Data from: The availability of research data declines rapidly with article age [Dataset]. http://doi.org/10.5683/SP2/S4AEXZ
Explore at:
Unique identifier
https://doi.org/10.5683/SP2/S4AEXZ
Dataset updated
Mar 16, 2024
Dataset provided by
Borealis
Authors
Vines, Timothy H.; Albert, Arianne Y. K.; Andrew, Rose L.; Débarre, Florence; Bock, Dan G.; Franklin, Michelle T.; Gilbert, Kimberly J.; Moore, Jean-Sébastien; Renaut, Sébastien; Rennison, Diana J.
Description
AbstractPolicies ensuring that research data are available on public archives are increasingly being implemented at the government, funding agency, and journal level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term, and indeed many studies have found that authors are often unable or unwilling to share their data. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested datasets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a dataset being extant fell by 17% per year. In addition, the odds that we could find a working email address for the first, last or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives., Usage notesVines_et_al R codeThe R code used for the analysesCurrentBiologyDatathe data used in the analyses. This does not contain any identifying information for any of the manuscripts - please contact the first author for access to the raw data.
H
Logan River Observatory: Logan River at the Utah Water Research Laboratory...
hydroshare.org
beta.hydroshare.org
+2more
zip
Updated Sep 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Logan River Observatory (2025). Logan River Observatory: Logan River at the Utah Water Research Laboratory west bridge Aquatic Site (LR_WaterLab_AA) Raw Data [Dataset]. https://www.hydroshare.org/resource/2b3afc29e11c412c84d5c9cd9b6279d6
Explore at:
zip(94.3 MB)Available download formats
Dataset updated
Sep 2, 2025
Dataset provided by
HydroShare
Authors
Logan River Observatory
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This dataset contains raw data for all of the variables measured for the aquatic site on the Logan River at the Utah Water Research Laboratory west bridge (LR_WaterLab_AA). Each file contains a calendar year of data. The file for the current year is updated on a daily basis. The data values were collected by a variety of sensors at 15 minute intervals. The file header contains detailed metadata for site and the variable and method of each column. This site is currently operated as part of the Logan River Observatory. Prior to 2018 this site was operated as part of the iUTAH GAMUT Network.

Facebook

Twitter

Click to copy link

Link copied

Cite

Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Explore at:

Dataset updated

Apr 21, 2025

Dataset provided by

Agricultural Research Servicehttps://www.ars.usda.gov/

Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Clear search

Close search

Google apps

Main menu

Data from: Current and projected research data storage needs of Agricultural...

Figure 2 raw data

Public Availability of Published Research Data in High-Impact Journals

Survey data of "Mapping Research Output to the Sustainable Development Goals...

TagX Data collection for AI/ ML training | LLM data | Data collection for AI...

UC_vs_US Statistic Analysis.xlsx

Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw...

Health Research evidence-raw Data.sav

RAW Data Systematic Review.xlsx

Figure 1 raw data

Raw Data Logistics

Dataset with determinants or factors influencing graduate economics student...

Household Health Survey 2012-2013, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Raw data of Articles that were produced from PubMed search using the...

Raw data, R scripts and R datasets for statistical analyses from the...

Survey on data access conditions for sensitive data - Raw Data and...

Growing Roots: Connecting Elderly through Virtual Nature Spaces

Raw data of graduates' research

Data from: The availability of research data declines rapidly with article...

Logan River Observatory: Logan River at the Utah Water Research Laboratory...

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016See More Versions

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016