51 datasets found

Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
w
Dataset of book subjects that contain Beginning big data with Power BI and...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Beginning big data with Power BI and Excel 2013 : big data processing and analysis using Power BI in Excel 2013 [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Beginning+big+data+with+Power+BI+and+Excel+2013+:+big+data+processing+and+analysis+using+Power+BI+in+Excel+2013&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 3 rows and is filtered where the books is Beginning big data with Power BI and Excel 2013 : big data processing and analysis using Power BI in Excel 2013. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
18 excel spreadsheets by species and year giving reproduction and growth...
catalog.data.gov
data.wu.ac.at
Updated Aug 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
Explore at:
Dataset updated
Aug 17, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
f
Correlation Between Cancer Research Trends and Real World Data: An Analysis...
stemfellowship.figshare.com
png
Updated Jan 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Chou; Kevin Hong; Chandler Lei; Haolin Zhang (2017). Correlation Between Cancer Research Trends and Real World Data: An Analysis on Altmetric Data [Dataset]. http://doi.org/10.6084/m9.figshare.4595449.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4595449.v1
Dataset updated
Jan 29, 2017
Dataset provided by
STEM Fellowship Big Data Challenge
Authors
Peter Chou; Kevin Hong; Chandler Lei; Haolin Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The field of cancer research is overall ambiguous to the general population and apart from medical news, not a lot is known of the proceedings. This study aims to provide some clarity towards cancer research, especially towards the correlations between research of different types of cancer. The amount of research papers pertaining to different types of cancers is compared against mortality and diagnosis rates to determine the amount of research attention towards a type of cancer in relation to its overall importance or danger level to the general population. This is achieved through the use of many computational tools such as Python, R, and Microsoft Excel. Python is used to parse through the JSON files and extract the abstract and Altmetric score onto a single CSV file. R is used to iterate through the rows of the CSV files and count the appearance of each type of cancer in the abstract. As well as this, R creates the histograms describing Altmetric scores and file frequency. Microsoft Excel is used to provide further data analysis and find correlations between Altmetrics data and Canadian Cancer Society data. The analysis from these tools revealed that breast cancer was the most researched cancer by a large margin with nearly 1,700 papers. Although there were a large number of cancer research papers, the Altmetric scores revealed that most of these papers did not gain significant attention. By comparing these results to Canadian Cancer Society data, it was uncovered that Breast Cancer was receiving research attention that was not merited. There were four times more breast cancer research papers than the second most researched cancer, prostate cancer. This was despite the fact that breast cancer was fourth in mortality and third in new cases among all cancers. Inversely, lung cancer was underrepresented with only 401 research papers in spite of being the deadliest cancer in Canada.
N
Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...
neilsberg.com
csv, json
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa8c95e0-4983-11ef-ae5d-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 24, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Excel, Alabama
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

Key observations

The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the Excel is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
Enterprise Survey 2009-2019, Panel Data - Slovenia
microdata.worldbank.org
catalog.ihsn.org
Updated Aug 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
Explore at:
Dataset updated
Aug 6, 2020
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
European Investment Bankhttp://eib.org/
European Bank for Reconstruction and Developmenthttp://ebrd.com/
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
Additional file 1 of msBiodat analysis tool, big data analysis for...
springernature.figshare.com
bin
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pau Muñoz-Torres; Filip Rokć; Robert Belužic; Ivana Grbeša; Oliver Vugrek (2023). Additional file 1 of msBiodat analysis tool, big data analysis for high-throughput experiments [Dataset]. http://doi.org/10.6084/m9.figshare.c.3645041_D1.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3645041_D1.v1
Dataset updated
Jun 4, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Pau Muñoz-Torres; Filip Rokć; Robert Belužic; Ivana Grbeša; Oliver Vugrek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel spreadsheets. XLSX file containing the data from Sousa Abreu et al. which is used in the example of the article. (XLSX 611 kb)
Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A...
figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heather Cribbs; Gabriel Gardner (2023). Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A Multi-Campus Big Data Study" [Dataset]. http://doi.org/10.6084/m9.figshare.19071578.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19071578.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Heather Cribbs; Gabriel Gardner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Five files, one of which is a ZIP archive, containing data that support the findings of this study. PDF file "IA screenshots CSU Libraries search config" contains screenshots captured from the Internet Archive's Wayback Machine for all 24 CalState libraries' homepages for years 2017 - 2019. Excel file "CCIHE2018-PublicDataFile" contains Carnegie Classifications data from the Indiana University Center for Postsecondary Research for all of the CalState campuses from 2018. CSV file "2017-2019_RAW" contains the raw data exported from Ex Libris Primo Analytics (OBIEE) for all 24 CalState libraries for calendar years 2017 - 2019. CSV file "clean_data" contains the cleaned data from Primo Analytics which was used for all subsequent analysis such as charting and import into SPSS for statistical testing. ZIP archive file "NonparametricStatisticalTestsFromSPSS" contains 23 SPSS files [.spv format] reporting the results of testing conducted in SPSS. This archive includes things such as normality check, descriptives, and Kruskal-Wallis H-test results.
w
Data from: New Data Reduction Tools and their Application to The Geysers...
data.wu.ac.at
pdf
Updated Dec 5, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). New Data Reduction Tools and their Application to The Geysers Geothermal Field [Dataset]. https://data.wu.ac.at/schema/geothermaldata_org/NjNiMTc2MzQtOWQ5Mi00MjE5LWEwOWQtZDFjMmE5YjcwZWM0
Explore at:
pdfAvailable download formats
Dataset updated
Dec 5, 2017
Area covered
The Geysers, 3296a5bce23af293dbb49a144dfc986f894e7756
Description
Microsoft Excel based (using Visual Basic for Applications) data-reduction and visualization tools have been developed that allow to numerically reduce large sets of geothermal data to any size. The data can be quickly sifted through and graphed to allow their study. The ability to analyze large data sets can yield responses to field management procedures that would otherwise be undetectable. Field-wide trends such as decline rates, response to injection, evolution of superheat, recording instrumentation problems and data inconsistencies can be quickly queried and graphed. The application of these newly developed tools to data from The Geysers geothermal field is illustrated. A copy of these tools may be requested by contacting the authors.
N
Comprehensive Median Household Income and Distribution Dataset for Excel,...
neilsberg.com
Updated Jan 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Comprehensive Median Household Income and Distribution Dataset for Excel, AL: Analysis by Household Type, Size and Income Brackets [Dataset]. https://www.neilsberg.com/research/datasets/cd99ed47-b041-11ee-aaca-3860777c1fe6/
Explore at:
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Alabama, Excel
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the median household income in Excel. It can be utilized to understand the trend in median household income and to analyze the income distribution in Excel by household type, size, and across various income brackets.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Excel, AL Median Household Income Trends (2010-2021, in 2022 inflation-adjusted dollars)

Median Household Income Variation by Family Size in Excel, AL: Comparative analysis across 7 household sizes

Income Distribution by Quintile: Mean Household Income in Excel, AL

Excel, AL households by income brackets: family, non-family, and total, in 2022 inflation-adjusted dollars

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Excel median household income. You can refer the same here
d
Data from: Improper data practices erode the quality of global ecological...
dataone.org
data.niaid.nih.gov
+1more
Updated Jul 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven Augustine; Isaac Bailey-Marren; Katherine Charton; Nathan Kiel; Michael Peyton (2025). Improper data practices erode the quality of global ecological databases and impede the progress of ecological research [Dataset]. http://doi.org/10.5061/dryad.wdbrv15w1
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.wdbrv15w1
Dataset updated
Jul 25, 2025
Dataset provided by
Dryad Digital Repository
Authors
Steven Augustine; Isaac Bailey-Marren; Katherine Charton; Nathan Kiel; Michael Peyton
Time period covered
Jan 1, 2023
Description
The scientific community has entered an era of big data. However, with big data comes big responsibilities, and best practices for how data are contributed to databases have not kept pace with the collection, aggregation, and analysis of big data. Here, we rigorously assess the quantity of data for specific leaf area (SLA) available within the largest and most frequently used global plant trait database, the TRY Plant Trait Database, exploring how much of the data were applicable (i.e., original, representative, logical, and comparable) and traceable (i.e., published, cited, and consistent). Over three-quarters of the SLA data in TRY either lacked applicability or traceability, leaving only 22.9% of the original data usable compared to the 64.9% typically deemed usable by standard data cleaning protocols. The remaining usable data differed markedly from the original for many species, which led to altered interpretation of ecological analyses. Though the data we consider here make up onl..., SLA data was downlaoded from TRY (traits 3115, 3116, and 3117) for all conifer (Araucariaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, and Taxaceae),Â Plantago, Poa, and Quercus species. The data has not been processed in any way, but additional columns have been added to the datset that provide the viewer with information about where each data point came from, how it was cited, how it was measured, whether it was uploaded correctly, whether it had already been uploaded to TRY, and whether it was uploaded by the individual who collected the data., , There are two additional documents associated with this publication. One is a word document that includes a description of each of the 120 datasets that contained SLA data for the four plant groups within the study (conifers, Plantago, Poa, and Quercus). The second is an excel document that contains the SLA data that was downloaded from TRY and all associated metadata.

Missing data codes: NA and N/A
s
Global Business Intelligence and Analytics Software Market Size, Share,...
skyquestt.com
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SkyQuest Technology (2024). Global Business Intelligence and Analytics Software Market Size, Share, Growth Analysis, By Type(Big Data Analytics, Business Analytics), By Deployment(On-premises, and Clouds), By Enterprise Size(Large Enterprises, and Small & Medium Enterprises (SMEs)), By End-Use(BFSI, Government) - Industry Forecast 2023-2030 [Dataset]. https://www.skyquestt.com/report/business-intelligence-and-analytics-software-market
Explore at:
Dataset updated
Apr 18, 2024
Dataset authored and provided by
SkyQuest Technology
License
https://www.skyquestt.com/privacy/https://www.skyquestt.com/privacy/
Time period covered
2023 - 2030
Area covered
Global
Description
Global Business Intelligence and Analytics Software Market size was valued at USD 21.56 Billion in 2022 and is poised to grow from USD 23.38 Billion in 2023 to USD 44.78 Billion by 2031, growing at a CAGR of 8.46% in the forecast period (2024-2031).
m
Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...
data.mendeley.com
Updated Oct 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abtin Ijadi Maghsoodi (2021). Integrated Cryptocurrency Historical Data for a Predictive Data-Driven Decision-Making Algorithm [Dataset]. http://doi.org/10.17632/37nb83jwtd.1
Explore at:
Unique identifier
https://doi.org/10.17632/37nb83jwtd.1
Dataset updated
Oct 29, 2021
Authors
Abtin Ijadi Maghsoodi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/

This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA
S&T Project 19155 Final Model Excel Tool: Econometric Analysis and Cost...
data.usbr.gov
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Bureau of Reclamation (2021). S&T Project 19155 Final Model Excel Tool: Econometric Analysis and Cost Forecasting for Relining Large Pipes [Dataset]. https://data.usbr.gov/catalog/4614/item/11466
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
United States Bureau of Reclamationhttp://www.usbr.gov/
Description
Excel spreadsheet tool that can be used to produce predicted costs for large pipe relining job, based on the project's final regression model.
IT Software Market Analysis Growth, Trends and Regional Forecast 2024-2028
technavio.com
pdf
Updated Aug 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). IT Software Market Analysis Growth, Trends and Regional Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/it-software-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Aug 19, 2024
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2024 - 2028
Description
Snapshot img

IT Software Market Size 2024-2028

The IT software market size is forecast to increase by USD 320.5 billion at a CAGR of 7.28% between 2023 and 2028. The market is experiencing significant growth, driven by the expansion of IT infrastructure and the increasing focus of companies on developing innovative software solutions. However, this growth comes with challenges, particularly in the areas of data security and endpoint attacks. As digital assets become more valuable, protecting them from cyber threats is a top priority. Strategic alliances and collaborations are also essential for software companies to stay competitive in the market. Additionally, the market is witnessing a shift towards cloud-based solutions and artificial intelligence integration, further shaping the competitive landscape. The software supply chain is another critical area of concern, as vulnerabilities in this area can lead to serious security breaches. In summary, the market is characterized by the need for advanced software solutions, a heightened focus on data security, and the importance of strategic partnerships.

What will be the Size of the Market During the Forecast Period?

Request Free Sample

The IT software market is evolving with a focus on security standards and malware protection, ensuring businesses safeguard sensitive data from cyber threats. Solutions like PowerStore offer efficient storage for small and medium enterprises (SMEs), enabling seamless integration with IoT (Internet of Things) devices to enhance operational efficiency. Stacklock technology further strengthens cybersecurity by providing advanced protection across software deployments. Development and deployment software solutions streamline the process of building and scaling applications, while on-premise installations ensure data security within enterprise environments. Additionally, managing the raw material supply chain becomes easier with these innovative software tools, optimizing logistics and reducing costs. Together, these technologies empower SMEs to adopt cutting-edge IT solutions while maintaining strong security and operational control.

Market Segmentation

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

Type Application software Systems software End-user BFSI Telecommunication Retail Healthcare Others Geography North America Canada US Europe Germany UK France APAC China India Japan South Korea Middle East and Africa South America

By Type Insights

The application software segment is estimated to witness significant growth during the forecast period. In the contemporary business landscape, application software plays a pivotal role in driving efficiency and productivity across various industries. These software solutions cater to diverse functionalities, encompassing productivity, business management, entertainment, and communication. Notably, data protection and network security have emerged as critical areas of focus, given the increasing prevalence of e-commerce and the Internet of Things (IoT). Software applications are extensively employed in sectors such as finance, healthcare, education, retail, and others, to manage and manipulate data effectively. For instance, enterprise resource planning (ERP) and customer relationship management (CRM) systems enable businesses to manage employee and customer databases, ensuring data accuracy and security.

Moreover, individual users can leverage application software like Microsoft Excel to manage and analyze large data volumes, thereby streamlining operations and enhancing decision-making capabilities. Artificial Intelligence (AI) and Machine Learning (ML) have gained significant traction in recent times, with software solutions integrating these technologies to offer advanced capabilities. For example, AI-powered cybersecurity tools provide vital network protection, while e-commerce platforms leverage AI for personalized customer experiences and predictive analytics. In summary, application software solutions continue to shape the business world by offering functionalities that cater to evolving industry needs. Data protection and network security are key areas of focus, with AI and ML integration adding advanced capabilities to software applications.

Get a glance at the market share of various segments Request Free Sample

The application software segment accounted for USD 343.00 billion in 2018 and showed a gradual increase during the forecast period.

Regional Insights

North America is estimated to contribute 48% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and
f
Future of Science: Detecting Cancer
stemfellowship.figshare.com
png
Updated Feb 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vic Li; Cecilia Shi; Michael Yang; Eva Zhang; Suzy Zhang; George Li (2017). Future of Science: Detecting Cancer [Dataset]. http://doi.org/10.6084/m9.figshare.4621006.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4621006.v1
Dataset updated
Feb 5, 2017
Dataset provided by
STEM Fellowship Big Data Challenge
Authors
Vic Li; Cecilia Shi; Michael Yang; Eva Zhang; Suzy Zhang; George Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Using Excel Data Analysis Tools and BigML Machine Learning platform, we tested correlation between biopsy data for breast cancer and created a model which helps to distinguish between benign and malignant tumors. Data set of oncology patients were used to analyze links between 10 indicators collected by biopsy non- cancerous and cancerous tumours. Created model can be used as a future medical science tool and can be available to specially trained histology nurses in rural areas. Developed model that can be used to detect cancer on early stages is especially important in the view of the fact that detecting cancer at stage IV give patients of about 22% of survival rate 1.
G
Graph Data Integration Platform Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Graph Data Integration Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/graph-data-integration-platform-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Aug 22, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Graph Data Integration Platform Market Outlook

According to our latest research, the global graph data integration platform market size reached USD 2.1 billion in 2024, reflecting robust adoption across industries. The market is projected to grow at a CAGR of 18.4% from 2025 to 2033, reaching approximately USD 10.7 billion by 2033. This significant growth is fueled by the increasing need for advanced data management and analytics solutions that can handle complex, interconnected data across diverse organizational ecosystems. The rapid digital transformation and the proliferation of big data have further accelerated the demand for graph-based data integration platforms.

The primary growth factor driving the graph data integration platform market is the exponential increase in data complexity and volume within enterprises. As organizations collect vast amounts of structured and unstructured data from multiple sources, traditional relational databases often struggle to efficiently process and analyze these data sets. Graph data integration platforms, with their ability to map, connect, and analyze relationships between data points, offer a more intuitive and scalable solution. This capability is particularly valuable in sectors such as BFSI, healthcare, and telecommunications, where real-time data insights and dynamic relationship mapping are crucial for decision-making and operational efficiency.

Another significant driver is the growing emphasis on advanced analytics and artificial intelligence. Modern enterprises are increasingly leveraging AI and machine learning to extract actionable insights from their data. Graph data integration platforms enable the creation of knowledge graphs and support complex analytics, such as fraud detection, recommendation engines, and risk assessment. These platforms facilitate seamless integration of disparate data sources, enabling organizations to gain a holistic view of their operations and customers. As a result, investment in graph data integration solutions is rising, particularly among large enterprises seeking to enhance their analytics capabilities and maintain a competitive edge.

The surge in regulatory requirements and compliance mandates across various industries also contributes to the expansion of the graph data integration platform market. Organizations are under increasing pressure to ensure data accuracy, lineage, and transparency, especially in highly regulated sectors like finance and healthcare. Graph-based platforms excel in tracking data provenance and relationships, making it easier for companies to comply with regulations such as GDPR, HIPAA, and others. Additionally, the shift towards hybrid and multi-cloud environments further underscores the need for robust data integration tools capable of operating seamlessly across different infrastructures, further boosting market growth.

From a regional perspective, North America currently dominates the graph data integration platform market, accounting for the largest share due to early adoption of advanced data technologies, a strong presence of key market players, and significant investments in digital transformation initiatives. However, Asia Pacific is expected to witness the fastest growth over the forecast period, driven by rapid industrialization, expanding IT infrastructure, and increasing adoption of cloud-based solutions among enterprises in countries like China, India, and Japan. Europe also remains a significant contributor, supported by stringent data privacy regulations and a mature digital economy.

Component Analysis

The component segment of the graph data integration platform market is bifurcated into software and services. The software segment currently commands the largest market share, reflecting the critical role of robust graph database engines, visualization tools, and integration frameworks in managing and analyzing complex data relationships. These software solutions are designed to deliver high scalability, flexibility, and real-time proces
D
Columnar Database Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Columnar Database Market Research Report 2033 [Dataset]. https://dataintelo.com/report/columnar-database-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Columnar Database Market Outlook

According to our latest research, the global Columnar Database market size reached USD 3.2 billion in 2024, reflecting a robust demand for high-performance data management solutions across various industries. The market is expected to grow at a CAGR of 13.1% from 2025 to 2033, reaching a forecasted value of USD 8.6 billion by 2033. This remarkable growth trajectory is primarily driven by the exponential increase in data volume, the surge in business intelligence and analytics applications, and the rapid digital transformation initiatives being adopted by enterprises worldwide.

A significant growth factor for the columnar database market is the escalating need for real-time analytics and high-speed data processing. Organizations are increasingly leveraging big data and complex analytics to gain actionable insights and maintain a competitive edge. Traditional row-based databases often struggle with performance bottlenecks when handling large-scale analytical queries. In contrast, columnar databases excel in such environments by enabling faster data retrieval and optimized storage, making them a preferred choice for enterprises seeking to enhance their decision-making processes. The adoption of advanced analytics, artificial intelligence, and machine learning is further fueling the demand for columnar database solutions, as these technologies require rapid access to vast datasets and efficient query performance.

Another critical driver is the widespread adoption of cloud computing and hybrid IT infrastructures. As businesses migrate their workloads to cloud environments, the flexibility, scalability, and cost-effectiveness of columnar databases become increasingly attractive. Cloud-based columnar database solutions offer seamless integration, real-time scalability, and robust disaster recovery capabilities, which are essential for modern enterprises operating in dynamic markets. Additionally, the proliferation of Software-as-a-Service (SaaS) applications and the growing reliance on data-driven business models are pushing organizations to invest in advanced database architectures that can handle the complexities of multi-tenant environments and massive concurrent queries, further accelerating market expansion.

The surge in regulatory compliance requirements and data governance standards is also shaping the growth of the columnar database market. Industries such as BFSI, healthcare, and government are under increasing pressure to manage, store, and analyze sensitive data securely and efficiently. Columnar databases offer enhanced data compression, encryption, and auditing capabilities, making them ideal for organizations that must adhere to stringent regulatory frameworks like GDPR, HIPAA, and PCI DSS. As data privacy concerns and compliance mandates intensify globally, organizations are prioritizing investments in database technologies that not only deliver high performance but also ensure robust data security and governance, thereby fueling market growth.

From a regional perspective, North America continues to lead the columnar database market, driven by the presence of major technology vendors, early adoption of innovative IT solutions, and the high concentration of data-centric industries. Europe follows closely, with significant investments in digital transformation and regulatory compliance initiatives. The Asia Pacific region is emerging as a high-growth market, propelled by rapid industrialization, expanding digital infrastructure, and increasing adoption of cloud-based services across sectors such as retail, BFSI, and healthcare. Latin America and the Middle East & Africa are also witnessing steady growth, albeit at a relatively slower pace, as enterprises in these regions gradually embrace digital transformation and data-driven business strategies.

Component Analysis

The columnar database market is segmented by component into software and services, each playing a pivotal role in the overall ecosystem. The software segment dominates the market, accounting for the largest revenue share in 2024. This dominance is attributed to the continuous advancements in database technologies, increasing demand for high-performance data processing, and the proliferation of data-intensive applications. Modern columnar database software solutions are designed to deliver exceptional query performance, scalability, and flexibility, enabling organizations to efficiently manage and analyze vast volumes of
Massive Bank dataset ( 1 Million+ rows)
kaggle.com
zip
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
K S ABISHEK (2023). Massive Bank dataset ( 1 Million+ rows) [Dataset]. https://www.kaggle.com/datasets/ksabishek/massive-bank-dataset-1-million-rows
Explore at:
zip(32471013 bytes)Available download formats
Dataset updated
Feb 21, 2023
Authors
K S ABISHEK
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Greetings , fellow analysts !

(NOTE : This is a random dataset generated using python. It bears no resemblance to any real entity in the corporate world. Any resemblance is a matter of coincidence.)

REC-SSEC Bank is a govt-aided bank operating in the Indian Peninsula. They have regional branches in over 40+ regions of the country. You have been provided with a massive excel sheet containing the transaction details, the total transaction amount and their location and total transaction count.

The dataset is described as follows :

Date - The date on which the transaction took place. 2.Domain - Where or which type of Business entity made the transaction. 3.Location - Where the data is collected from 4.Value - Total value of transaction

Count of transaction .

For example , in the very first row , the data can be read as : " On the first of January, 2022 , 1932 transactions of summing upto INR 365554 from Bhuj were reported " NOTE : There are about 2750 transactions every single day. All of this has been given to you.

The bank wants you to answer the following questions :

What is the average transaction value everyday for each domain over the year.

What is the average transaction value for every city/location over the year

The bank CEO , Mr: Hariharan , wants to promote the ease of transaction for the highest active domain. If the domains could be sorted into a priority, what would be the priority list ?

What's the average transaction count for each city ?

Facebook

Twitter

Click to copy link

Link copied

Cite

Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Explore at:

Dataset updated

Apr 21, 2025

Dataset provided by

Agricultural Research Servicehttps://www.ars.usda.gov/

Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Clear search

Close search

Google apps

Main menu

Data from: Current and projected research data storage needs of Agricultural...

Dataset of book subjects that contain Beginning big data with Power BI and...

18 excel spreadsheets by species and year giving reproduction and growth...

Correlation Between Cancer Research Trends and Real World Data: An Analysis...

Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

About this dataset

Content

Inspiration

Recommended for further research

Enterprise Survey 2009-2019, Panel Data - Slovenia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Data Cleaning Sample

Additional file 1 of msBiodat analysis tool, big data analysis for...

Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A...

Data from: New Data Reduction Tools and their Application to The Geysers...

Comprehensive Median Household Income and Distribution Dataset for Excel,...

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Data from: Improper data practices erode the quality of global ecological...

Global Business Intelligence and Analytics Software Market Size, Share,...

Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...

S&T Project 19155 Final Model Excel Tool: Econometric Analysis and Cost...

IT Software Market Analysis Growth, Trends and Regional Forecast 2024-2028

Snapshot img

Future of Science: Detecting Cancer

Graph Data Integration Platform Market Research Report 2033

Graph Data Integration Platform Market Outlook

Component Analysis

Columnar Database Market Research Report 2033

Columnar Database Market Outlook

Component Analysis

Massive Bank dataset ( 1 Million+ rows)

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016See More Versions

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016