Facebook
TwitterThe USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
PROJECT OBJECTIVE
We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.
Questions (KPIs)
TASK 1: STANDARDIZING THE DATASET
TASK 2: DATA FORMATING
TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:
TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)
• Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:
TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)
• Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:
Process
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About Dataset Safa S. Abdul-Jabbar, Alaa k. Farhan
Context This is the first Dataset for various ordinary patients in Iraq. The Dataset provides the patients’ Cell Blood Count test information that can be used to create a Hematology diagnosis/prediction system. Also, this Data was collected in 2022 from Al-Zahraa Al-Ahly Hospital. These data can be cleaned & analyzed using any programming language because it is provided in an excel file that can be accessed and manipulated easily. The user just needs to understand how rows and columns are arranged because the data was collected as images(CBC images) from the laboratories and then stored the extracted data in an excel file. Content This Dataset contains 500 rows. For each row (patient information), there are 21 columns containing CBC test features that can be described as follows:
ID: Patients Identifier
WBC: White Blood Cell, Normal Ranges: 4.0 to 10.0, Unit: 10^9/L.
LYMp: Lymphocytes percentage, which is a type of white blood cell, Normal Ranges: 20.0 to 40.0, Unit: %
MIDp: Indicates the percentage combined value of the other types of white blood cells not classified as lymphocytes or granulocytes, Normal Ranges: 1.0 to 15.0, Unit: %
NEUTp: Neutrophils are a type of white blood cell (leukocytes); neutrophils percentage, Normal Ranges: 50.0 to 70.0, Unit: %
LYMn: Lymphocytes number are a type of white blood cell, Normal Ranges: 0.6 to 4.1, Unit: 10^9/L.
MIDn: Indicates the combined number of other white blood cells not classified as lymphocytes or granulocytes, Normal Ranges: 0.1 to 1.8, Unit: 10^9/L.
NEUTn: Neutrophils Number, Normal Ranges: 2.0 to 7.8, Unit: 10^9/L.
RBC: Red Blood Cell, Normal Ranges: 3.50 to 5.50, Unit: 10^12/L
HGB: Hemoglobin, Normal Ranges: 11.0 to 16.0, Unit: g/dL
HCT: Hematocrit is the proportion, by volume, of the Blood that consists of red blood cells, Normal Ranges: 36.0 to 48.0, Unit: %
MCV: Mean Corpuscular Volume, Normal Ranges: 80.0 to 99.0, Unit: fL
MCH: Mean Corpuscular Hemoglobin is the average amount of haemoglobin in the average red cell, Normal Ranges: 26.0 to 32.0, Unit: pg
MCHC: Mean Corpuscular Hemoglobin Concentration, Normal Ranges: 32.0 to 36.0, Unit: g/dL
RDWSD: Red Blood Cell Distribution Width, Normal Ranges: 37.0 to 54.0, Unit: fL
RDWCV: Red blood cell distribution width, Normal Ranges: 11.5 to 14.5, Unit: %
PLT: Platelet Count, Normal Ranges: 100 to 400, Unit: 10^9/L
MPV: Mean Platelet Volume, Normal Ranges: 7.4 to 10.4, Unit: fL
PDW: Red Cell Distribution Width, Normal Ranges: 10.0 to 17.0, Unit: %
PCT: The level of Procalcitonin in the Blood, Normal Ranges: 0.10 to 0.28, Unit: %
PLCR: Platelet Large Cell Ratio, Normal Ranges: 13.0 to 43.0, Unit: %
Acknowledgements We thank the entire Al-Zahraa Al-Ahly Hospital Hospital team, especially the hospital manager, for cooperating with us in collecting this data while maintaining patients' confidentiality.
Facebook
TwitterIn order to test hypotheses about groundwater flow under and into estuaries and the Atlantic Ocean, geophysical surveys, geophysical probing, submarine groundwater sampling, and sediment coring were conducted by U.S. Geological Survey (USGS) scientists at Cape Cod National Seashore (CCNS) from 2004 through 2006. Coastal resource managers at CCNS and elsewhere are concerned about nutrients that are entering coastal waters via submarine groundwater discharge, which are contributing to eutrophication and harmful algal blooms. The research carried out as part of the study described here was designed, in part, to help refine assumptions required by earlier versions of models about the nature of submarine groundwater flow and discharge at CCNS. This study was conducted in four phases, with a variety of field techniques and equipment employed in each phase. Phase 1 consisted of continuous resistivity profiling (CRP) surveys of the entire study area conducted in 2004. Phase 2 consisted of CRP ground-truthing via resistivity probe measurements and submarine groundwater sampling from hydraulically-drive piezometers using a barge in the Salt Pond/Nauset Marsh area in 2005. Phase 3 consisted of supplemental detailed CRP surveys in the Salt Pond/Nauset Marsh area in 2006. Finally, Phase 4 consisted of sediment coring and porewater extraction in the Salt Pond/Nauset Marsh area later in 2006 to supplement the 2005 sampling.
Facebook
TwitterThis dataset was created by denggui feng613
Facebook
TwitterThe annual Retail store data CD-ROM is an easy-to-use tool for quickly discovering retail trade patterns and trends. The current product presents results from the 1999 and 2000 Annual Retail Store and Annual Retail Chain surveys. This product contains numerous cross-classified data tables using the North American Industry Classification System (NAICS). The data tables provide access to a wide range of financial variables, such as revenues, expenses, inventory, sales per square footage (chain stores only) and the number of stores. Most data tables contain detailed information on industry (as low as 5-digit NAICS codes), geography (Canada, provinces and territories) and store type (chains, independents, franchises). The electronic product also contains survey metadata, questionnaires, information on industry codes and definitions, and the list of retail chain store respondents.
Facebook
TwitterIn order to test hypotheses about groundwater flow under and into estuaries and the Atlantic Ocean, geophysical surveys, geophysical probing, submarine groundwater sampling, and sediment coring were conducted by U.S. Geological Survey (USGS) scientists at Cape Cod National Seashore (CCNS) from 2004 through 2006. Coastal resource managers at CCNS and elsewhere are concerned about nutrients that are entering coastal waters via submarine groundwater discharge, which are contributing to eutrophication and harmful algal blooms. The research carried out as part of the study described here was designed, in part, to help refine assumptions required by earlier versions of models about the nature of submarine groundwater flow and discharge at CCNS. This study was conducted in four phases, with a variety of field techniques and equipment employed in each phase. Phase 1 consisted of continuous resistivity profiling (CRP) surveys of the entire study area conducted in 2004. Phase 2 consisted of CRP ground-truthing via resistivity probe measurements and submarine groundwater sampling from hydraulically-drive piezometers using a barge in the Salt Pond/Nauset Marsh area in 2005. Phase 3 consisted of supplemental detailed CRP surveys in the Salt Pond/Nauset Marsh area in 2006. Finally, Phase 4 consisted of sediment coring and porewater extraction in the Salt Pond/Nauset Marsh area later in 2006 to supplement the 2005 sampling.
Facebook
TwitterThe ITEX experiment at Audkuluheidi was started in 1996 when control and OTC plots 1-5 were set up. In 1997 Control and OTC plots 6-10 were set up in the protected area (No Graze). Also in 1997, 10 control plots were set up in the adjacent grazed area (Graze). In 2000, all plots were sampled again. This dataset is in excel format. For more information, please see the readme file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The open repository consists of two folders; Dataset and Picture. The dataset folder consists file “AWS Dataset Pangandaraan.xlsx”. There are 10 columns with three first columns as time attributes and the other six as atmosphere datasets. Each parameter has 8085 data, and Each parameter has a parameter index at the bottom of the column we added, including mMinimum, mMaximum, and Average values.
For further use, the user can choose one or more parameters for calculating or analyzing. For example, wind data (speed and direction) can be utilized to calculate Waves using the Hindcast method. Furthermore, the user can filter data by using the feature in Excel to extract the exact time range for analyzing various phenomena considered correlated to atmosphere data around Pangandaran, Indonesia.
The second folder, named “Picture,” contains three figures, including the monthly distribution of datasets, temporal data, and wind rose. Furthermore, the user can filter data by using the feature in Excel sheet to extract the exact time range for analyzing various phenomena considered correlated to atmosphere data around Pangandaran, Indonesia
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe Qualitative aspect of health care delivery is one of the major factors in reducing morbidity and mortality in a health care setup. The expanding suburban secondary health care delivery facilities of the Municipal Corporation of Greater Mumbai are an important part of the healthcare backbone of Mumbai and therefore the quality of care delivered here needed standardization.Material and MethodsThe project was completed over a period of one year from Jan to Dec, 2013 and implemented in three phases. The framework with components and sub-components were developed and formats for data collection were standardized. The benchmarks were based on past performance in the same hospital and probability was used for development of normal range. An Excel spreadsheet was developed to facilitate data analysis.ResultsThe indicators comprise of 3 components - Statutory Requirements, Patient care & Cure and Administrative efficiency. The measurements made, pointed to the broad areas needing attention.ConclusionThe Indicators for patient care and monitoring standards can be used as a self assessment tool for health care setups for standardization and improvement of delivery of health care services.
Facebook
TwitterNo description is available. Visit https://dataone.org/datasets/6ffb72520e80a412991cd50d38f324d6 for complete metadata about this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
File name definitions:
'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s
'...v_175_250...' - dataset for velocity range [175, 250] m/s
'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected
'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart
Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?
input values in 'IN' sheet
target values in 'TARGET' sheet
Where to find the results from the best ANN model (for each target/output variable and each velocity range)?
open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet
Check reference below (to be added when the paper is published)
https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset was supplied to the Bioregional Assessment Programme by a third party and is presented here as originally supplied. Metadata was not provided and has been compiled by the Bioregional Assessment Programme based on known details at the time of acquisition.
Mean monthly flow (ML/month) and Annual flow (ML/yr) data at key gauges in the Macalister Irrigation District (MID) as monitored by SRW. The data are provided in MS Excel format in worksheets and charts.
Data used to produce Time-series drainage volume data provided by a third party. Site information and monitoring drainage flow data provided by the Southern Rural Water are specific to the Macalister Irrigation District.
Time specific data in the range 23/07/1997 to 31/12/2013
This dialogue has been copied from a draft of the BA-GIP report.
A total of 197 river gauges were identified within the model area representing all of the major rivers. Daily gauge level data was sourced from the Victorian Department of Environment, Land, Water and Planning Water Measurement Information System (WMIS, 2015). A list of the river gauges is provided in the report for key river basins
Only main stems of the major rivers were included in the model. These river reaches were identified using the DEPI hydro25 spatial data set (DEPI, 2014). The river classification was used to vary river incision depth (depth below the ground surface as defined by the digital elevation model) and width attributes. In the absence of recorded stage height information, river classification was used to estimate river stage heights. A total of 22,573 river cells are included in the model. Fifty-one gauges were selected to calibrate the catchment modelling framework in unregulated catchments based on Base Flow Indexes and observed stream flows.
Drainage channels and man-made drainage features in the Macalister Irrigation District (MID) were included in the model based on available drainage network mapping. This information was sourced from Southern Rural Water (SRW) and the DEPI Corporate Spatial Data library. Drainage cells are assigned to the uppermost cells within the model to capture groundwater discharge processes. Drain cells in Modflow can only act as groundwater discharge points and as such those cells outside drainage channels will be characterised as having a bed elevation equivalent to ground surface elevation. A total of 410,504 drainage cells are incorporated in the model. Apart from 3 river gauges sourced from the WMIS, SRW also has 15 gauges monitored drainage from the MID. The measurements commenced between 1997 and 2005. Of the 15 gauges, six were selected to calibrate the catchment modelling framework based on observed discharge.
Victorian Department of Economic Development, Jobs, Transport and Resources (2015) Mean monthly flow & annual flow data - Macalister Irrigation District. Bioregional Assessment Source Dataset. Viewed 05 October 2018, http://data.bioregionalassessments.gov.au/dataset/6ba89d78-1e42-4e02-bd5c-a435ee15bef4.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A range of quarterly Excel spreadsheets and SuperTABLE datacubes. The spreadsheets contain broad level data covering all the major items of the Labour Force Survey in time series format, including seasonally adjusted and trend estimates. The datacubes contain more detailed and cross classified original data than the spreadsheets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within Excel. The dataset can be utilized to gain insights into gender-based income distribution within the Excel population, aiding in data analysis and decision-making..
Key observations
https://i.neilsberg.com/ch/excel-al-income-distribution-by-gender-and-employment-type.jpeg" alt="Excel, AL gender and employment-based income distribution analysis (Ages 15+)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income brackets:
Variables / Data Columns
Employment type classifications include:
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel median household income by gender. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains information from a cohort of 799 patients admitted in the hospital for COVID-19, characterized with sociodemographic and clinical data. Retrospectively, from November 2020 to January 2021, data was collected from the medical records of all hospital admissions that occurred from March 1st, 2020, to December 31st, 2020. The analysis of these data can contribute to the definition of the clinical and sociodemographic profile of patients with COVID-19. Understanding these data can contribute to elucidating the sociodemographic profile, clinical variables and health conditions of patients hospitalized by COVID-19. To this end, this database contains a wide range of variables, such as: Month of hospitalization Gender Age group Ethnicity Marital status Paid work Admission to clinical ward Hospitalization in the Intensive Care Unit (ICU)COVID-19 diagnosisNumber of times hospitalized by COVID-19Hospitalization time in daysRisk Classification ProtocolData is presented as a single Excel XLSX file: dataset.xlsx of clinical and sociodemographic characteristics of hospital admissions by COVID-19: retrospective cohort of patients in two hospitals in the Southern of Brazil. Researchers interested in studying the data related to patients affected by COVID-19 can extensively explore the variables described here. Approved by the Research Ethics Committee (No. 4.323.917/2020) of the Federal University of Santa Catarina.
Facebook
TwitterExcel Age-Range creator for Office for National Statistics (ONS) Mid year population estimates (MYE) covering each year between 1999 and 2016 These files take into account the revised estimates for 2002-2010 released in April 2013 down to Local Authority level and the post 2011 estimates based on the Census results. Scotland and Northern Ireland data has not been revised, so Great Britain and United Kingdom totals comprise the original data for these plus revised England and Wales figures. This Excel based tool enables users to query the single year of age raw data so that any age range can easily be calculated without having to carry out often complex, and time consuming formulas that could also be open to human error. Simply select the lower and upper age range for both males and females and the spreadsheet will return the total population for the range. Please adhere to the terms and conditions of supply contained within the file. Tip: You can copy and paste the rows you are interested in to another worksheet by using the filters at the top of the columns and then select all by pressing Ctrl+A. Then simply copy and paste the cells to a new location. ONS Mid year population estimates Open Excel tool (London Boroughs, Regions and National, 1999-2016) Also available is a custom-age tool for all geographies in the UK. This full MYE dataset by single year of age (SYA) age and gender is available as a Datastore package here. Ward Level Population estimates Single year of age population tool for 2002 to 2015 for all wards in London. New 2014 Ward boundary estimates Ward boundary changes in May 2014 only affected three London boroughs - Hackney, Kensington and Chelsea, and Tower Hamlets. The estimates between 2001-2013 have been calculated by the GLA by taking the proportion of a the old ward that falls within the new ward based on the proportion of population living in each area at the 2011 Census. Therefore, these estimates are purely indicative and are not official statistics and not endorsed by ONS. From 2014 onwards, ONS began publishing official estimates for the new ward boundaries. Download here.
Facebook
TwitterThis is a compilation of logs and data from Well 82-33 in the Roosevelt Hot Springs area in Utah. This well is also in the Utah FORGE study area. The file is in a compressed .zip format and there is a data inventory table (Excel spreadsheet) in the root folder that is a guide to the data that is accessible in subfolders.
Facebook
TwitterThe Alaska Geochemical Database Version 2.0 (AGDB2) contains new geochemical data compilations in which each geologic material sample has one "best value" determination for each analyzed species, greatly improving speed and efficiency of use. Like the Alaska Geochemical Database (AGDB) before it, the AGDB2 was created and designed to compile and integrate geochemical data from Alaska in order to facilitate geologic mapping, petrologic studies, mineral resource assessments, definition of geochemical baseline values and statistics, environmental impact assessments, and studies in medical geology. This relational database, created from the Alaska Geochemical Database (AGDB) that was released in 2011, serves as a data archive in support of present and future Alaskan geologic and geochemical projects, and contains data tables in several different formats describing historical and new quantitative and qualitative geochemical analyses. The analytical results were determined by 85 laboratory and field analytical methods on 264,095 rock, sediment, soil, mineral and heavy-mineral concentrate samples. Most samples were collected by U.S. Geological Survey (USGS) personnel and analyzed in USGS laboratories or, under contracts, in commercial analytical laboratories. These data represent analyses of samples collected as part of various USGS programs and projects from 1962 through 2009. In addition, mineralogical data from 18,138 nonmagnetic heavy mineral concentrate samples are included in this database. The AGDB2 includes historical geochemical data originally archived in the USGS Rock Analysis Storage System (RASS) database, used from the mid-1960s through the late 1980s and the USGS PLUTO database used from the mid-1970s through the mid-1990s. All of these data are currently maintained in the National Geochemical Database (NGDB). Retrievals from the NGDB were used to generate most of the AGDB data set. These data were checked for accuracy regarding sample location, sample media type, and analytical methods used. This arduous process of reviewing, verifying and, where necessary, editing all USGS geochemical data resulted in a significantly improved Alaska geochemical dataset. USGS data that were not previously in the NGDB because the data predate the earliest USGS geochemical databases, or were once excluded for programmatic reasons, are included here in the AGDB2 and will be added to the NGDB. The AGDB2 data provided here are the most accurate and complete to date, and should be useful for a wide variety of geochemical studies. The AGDB2 data provided in the linked database may be updated or changed periodically.
Facebook
TwitterThis is a compilation of logs and data from Well 52-21 in the Roosevelt Hot Springs area in Utah. This well is also in the Utah FORGE study area. The file is in a compressed .zip format and there is a data inventory table (Excel spreadsheet) in the root folder that is a guide to the data that is accessible in subfolders.
Facebook
TwitterThe USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel