51 datasets found
  1. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  2. w

    Dataset of book subjects that contain Beginning big data with Power BI and...

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain Beginning big data with Power BI and Excel 2013 : big data processing and analysis using Power BI in Excel 2013 [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Beginning+big+data+with+Power+BI+and+Excel+2013+:+big+data+processing+and+analysis+using+Power+BI+in+Excel+2013&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 3 rows and is filtered where the books is Beginning big data with Power BI and Excel 2013 : big data processing and analysis using Power BI in Excel 2013. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  3. 18 excel spreadsheets by species and year giving reproduction and growth...

    • catalog.data.gov
    • data.wu.ac.at
    Updated Aug 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
    Explore at:
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).

  4. f

    Correlation Between Cancer Research Trends and Real World Data: An Analysis...

    • stemfellowship.figshare.com
    png
    Updated Jan 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Chou; Kevin Hong; Chandler Lei; Haolin Zhang (2017). Correlation Between Cancer Research Trends and Real World Data: An Analysis on Altmetric Data [Dataset]. http://doi.org/10.6084/m9.figshare.4595449.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jan 29, 2017
    Dataset provided by
    STEM Fellowship Big Data Challenge
    Authors
    Peter Chou; Kevin Hong; Chandler Lei; Haolin Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The field of cancer research is overall ambiguous to the general population and apart from medical news, not a lot is known of the proceedings. This study aims to provide some clarity towards cancer research, especially towards the correlations between research of different types of cancer. The amount of research papers pertaining to different types of cancers is compared against mortality and diagnosis rates to determine the amount of research attention towards a type of cancer in relation to its overall importance or danger level to the general population. This is achieved through the use of many computational tools such as Python, R, and Microsoft Excel. Python is used to parse through the JSON files and extract the abstract and Altmetric score onto a single CSV file. R is used to iterate through the rows of the CSV files and count the appearance of each type of cancer in the abstract. As well as this, R creates the histograms describing Altmetric scores and file frequency. Microsoft Excel is used to provide further data analysis and find correlations between Altmetrics data and Canadian Cancer Society data. The analysis from these tools revealed that breast cancer was the most researched cancer by a large margin with nearly 1,700 papers. Although there were a large number of cancer research papers, the Altmetric scores revealed that most of these papers did not gain significant attention. By comparing these results to Canadian Cancer Society data, it was uncovered that Breast Cancer was receiving research attention that was not merited. There were four times more breast cancer research papers than the second most researched cancer, prostate cancer. This was despite the fact that breast cancer was fourth in mortality and third in new cases among all cancers. Inversely, lung cancer was underrepresented with only 401 research papers in spite of being the deadliest cancer in Canada.

  5. N

    Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

    • neilsberg.com
    csv, json
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa8c95e0-4983-11ef-ae5d-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Excel, Alabama
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

    Key observations

    The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Excel is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here

  6. Enterprise Survey 2009-2019, Panel Data - Slovenia

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Aug 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
    Explore at:
    Dataset updated
    Aug 6, 2020
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    European Investment Bankhttp://eib.org/
    European Bank for Reconstruction and Developmenthttp://ebrd.com/
    Time period covered
    2008 - 2019
    Area covered
    Slovenia
    Description

    Abstract

    The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

    The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

    As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

    Geographic coverage

    National

    Analysis unit

    The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

    Universe

    As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

    Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

    For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

    For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

    Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

    For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

    For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

    For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

    Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

    Response rate

    Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

    Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

    For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

    For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

    For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

    Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.

  7. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  8. Additional file 1 of msBiodat analysis tool, big data analysis for...

    • springernature.figshare.com
    bin
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pau Muñoz-Torres; Filip Rokć; Robert Belužic; Ivana Grbeša; Oliver Vugrek (2023). Additional file 1 of msBiodat analysis tool, big data analysis for high-throughput experiments [Dataset]. http://doi.org/10.6084/m9.figshare.c.3645041_D1.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Pau Muñoz-Torres; Filip Rokć; Robert Belužic; Ivana Grbeša; Oliver Vugrek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel spreadsheets. XLSX file containing the data from Sousa Abreu et al. which is used in the example of the article. (XLSX 611 kb)

  9. Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A...

    • figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather Cribbs; Gabriel Gardner (2023). Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A Multi-Campus Big Data Study" [Dataset]. http://doi.org/10.6084/m9.figshare.19071578.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Heather Cribbs; Gabriel Gardner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Five files, one of which is a ZIP archive, containing data that support the findings of this study. PDF file "IA screenshots CSU Libraries search config" contains screenshots captured from the Internet Archive's Wayback Machine for all 24 CalState libraries' homepages for years 2017 - 2019. Excel file "CCIHE2018-PublicDataFile" contains Carnegie Classifications data from the Indiana University Center for Postsecondary Research for all of the CalState campuses from 2018. CSV file "2017-2019_RAW" contains the raw data exported from Ex Libris Primo Analytics (OBIEE) for all 24 CalState libraries for calendar years 2017 - 2019. CSV file "clean_data" contains the cleaned data from Primo Analytics which was used for all subsequent analysis such as charting and import into SPSS for statistical testing. ZIP archive file "NonparametricStatisticalTestsFromSPSS" contains 23 SPSS files [.spv format] reporting the results of testing conducted in SPSS. This archive includes things such as normality check, descriptives, and Kruskal-Wallis H-test results.

  10. w

    Data from: New Data Reduction Tools and their Application to The Geysers...

    • data.wu.ac.at
    pdf
    Updated Dec 5, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). New Data Reduction Tools and their Application to The Geysers Geothermal Field [Dataset]. https://data.wu.ac.at/schema/geothermaldata_org/NjNiMTc2MzQtOWQ5Mi00MjE5LWEwOWQtZDFjMmE5YjcwZWM0
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 5, 2017
    Area covered
    The Geysers, 3296a5bce23af293dbb49a144dfc986f894e7756
    Description

    Microsoft Excel based (using Visual Basic for Applications) data-reduction and visualization tools have been developed that allow to numerically reduce large sets of geothermal data to any size. The data can be quickly sifted through and graphed to allow their study. The ability to analyze large data sets can yield responses to field management procedures that would otherwise be undetectable. Field-wide trends such as decline rates, response to injection, evolution of superheat, recording instrumentation problems and data inconsistencies can be quickly queried and graphed. The application of these newly developed tools to data from The Geysers geothermal field is illustrated. A copy of these tools may be requested by contacting the authors.

  11. N

    Comprehensive Median Household Income and Distribution Dataset for Excel,...

    • neilsberg.com
    Updated Jan 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Comprehensive Median Household Income and Distribution Dataset for Excel, AL: Analysis by Household Type, Size and Income Brackets [Dataset]. https://www.neilsberg.com/research/datasets/cd99ed47-b041-11ee-aaca-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Alabama, Excel
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the median household income in Excel. It can be utilized to understand the trend in median household income and to analyze the income distribution in Excel by household type, size, and across various income brackets.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Excel, AL Median Household Income Trends (2010-2021, in 2022 inflation-adjusted dollars)
    • Median Household Income Variation by Family Size in Excel, AL: Comparative analysis across 7 household sizes
    • Income Distribution by Quintile: Mean Household Income in Excel, AL
    • Excel, AL households by income brackets: family, non-family, and total, in 2022 inflation-adjusted dollars

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Excel median household income. You can refer the same here

  12. d

    Data from: Improper data practices erode the quality of global ecological...

    • dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Augustine; Isaac Bailey-Marren; Katherine Charton; Nathan Kiel; Michael Peyton (2025). Improper data practices erode the quality of global ecological databases and impede the progress of ecological research [Dataset]. http://doi.org/10.5061/dryad.wdbrv15w1
    Explore at:
    Dataset updated
    Jul 25, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Steven Augustine; Isaac Bailey-Marren; Katherine Charton; Nathan Kiel; Michael Peyton
    Time period covered
    Jan 1, 2023
    Description

    The scientific community has entered an era of big data. However, with big data comes big responsibilities, and best practices for how data are contributed to databases have not kept pace with the collection, aggregation, and analysis of big data. Here, we rigorously assess the quantity of data for specific leaf area (SLA) available within the largest and most frequently used global plant trait database, the TRY Plant Trait Database, exploring how much of the data were applicable (i.e., original, representative, logical, and comparable) and traceable (i.e., published, cited, and consistent). Over three-quarters of the SLA data in TRY either lacked applicability or traceability, leaving only 22.9% of the original data usable compared to the 64.9% typically deemed usable by standard data cleaning protocols. The remaining usable data differed markedly from the original for many species, which led to altered interpretation of ecological analyses. Though the data we consider here make up onl..., SLA data was downlaoded from TRY (traits 3115, 3116, and 3117) for all conifer (Araucariaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, and Taxaceae), Plantago, Poa, and Quercus species. The data has not been processed in any way, but additional columns have been added to the datset that provide the viewer with information about where each data point came from, how it was cited, how it was measured, whether it was uploaded correctly, whether it had already been uploaded to TRY, and whether it was uploaded by the individual who collected the data., , There are two additional documents associated with this publication. One is a word document that includes a description of each of the 120 datasets that contained SLA data for the four plant groups within the study (conifers, Plantago, Poa, and Quercus). The second is an excel document that contains the SLA data that was downloaded from TRY and all associated metadata.

    Missing data codes: NA and N/A

  13. s

    Global Business Intelligence and Analytics Software Market Size, Share,...

    • skyquestt.com
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SkyQuest Technology (2024). Global Business Intelligence and Analytics Software Market Size, Share, Growth Analysis, By Type(Big Data Analytics, Business Analytics), By Deployment(On-premises, and Clouds), By Enterprise Size(Large Enterprises, and Small & Medium Enterprises (SMEs)), By End-Use(BFSI, Government) - Industry Forecast 2023-2030 [Dataset]. https://www.skyquestt.com/report/business-intelligence-and-analytics-software-market
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    SkyQuest Technology
    License

    https://www.skyquestt.com/privacy/https://www.skyquestt.com/privacy/

    Time period covered
    2023 - 2030
    Area covered
    Global
    Description

    Global Business Intelligence and Analytics Software Market size was valued at USD 21.56 Billion in 2022 and is poised to grow from USD 23.38 Billion in 2023 to USD 44.78 Billion by 2031, growing at a CAGR of 8.46% in the forecast period (2024-2031).

  14. m

    Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...

    • data.mendeley.com
    Updated Oct 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abtin Ijadi Maghsoodi (2021). Integrated Cryptocurrency Historical Data for a Predictive Data-Driven Decision-Making Algorithm [Dataset]. http://doi.org/10.17632/37nb83jwtd.1
    Explore at:
    Dataset updated
    Oct 29, 2021
    Authors
    Abtin Ijadi Maghsoodi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/

    This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA

  15. S&T Project 19155 Final Model Excel Tool: Econometric Analysis and Cost...

    • data.usbr.gov
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Bureau of Reclamation (2021). S&T Project 19155 Final Model Excel Tool: Econometric Analysis and Cost Forecasting for Relining Large Pipes [Dataset]. https://data.usbr.gov/catalog/4614/item/11466
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    United States Bureau of Reclamationhttp://www.usbr.gov/
    Description

    Excel spreadsheet tool that can be used to produce predicted costs for large pipe relining job, based on the project's final regression model.

  16. IT Software Market Analysis Growth, Trends and Regional Forecast 2024-2028

    • technavio.com
    pdf
    Updated Aug 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). IT Software Market Analysis Growth, Trends and Regional Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/it-software-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 19, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2024 - 2028
    Description

    Snapshot img

    IT Software Market Size 2024-2028

    The IT software market size is forecast to increase by USD 320.5 billion at a CAGR of 7.28% between 2023 and 2028. The market is experiencing significant growth, driven by the expansion of IT infrastructure and the increasing focus of companies on developing innovative software solutions. However, this growth comes with challenges, particularly in the areas of data security and endpoint attacks. As digital assets become more valuable, protecting them from cyber threats is a top priority. Strategic alliances and collaborations are also essential for software companies to stay competitive in the market. Additionally, the market is witnessing a shift towards cloud-based solutions and artificial intelligence integration, further shaping the competitive landscape. The software supply chain is another critical area of concern, as vulnerabilities in this area can lead to serious security breaches. In summary, the market is characterized by the need for advanced software solutions, a heightened focus on data security, and the importance of strategic partnerships.

    What will be the Size of the Market During the Forecast Period?

    Request Free Sample

    The IT software market is evolving with a focus on security standards and malware protection, ensuring businesses safeguard sensitive data from cyber threats. Solutions like PowerStore offer efficient storage for small and medium enterprises (SMEs), enabling seamless integration with IoT (Internet of Things) devices to enhance operational efficiency. Stacklock technology further strengthens cybersecurity by providing advanced protection across software deployments. Development and deployment software solutions streamline the process of building and scaling applications, while on-premise installations ensure data security within enterprise environments. Additionally, managing the raw material supply chain becomes easier with these innovative software tools, optimizing logistics and reducing costs. Together, these technologies empower SMEs to adopt cutting-edge IT solutions while maintaining strong security and operational control.

    Market Segmentation

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Type
    
      Application software
      Systems software
    
    
    End-user
    
      BFSI
      Telecommunication
      Retail
      Healthcare
      Others
    
    
    Geography
    
      North America
    
        Canada
        US
    
    
      Europe
    
        Germany
        UK
        France
    
    
      APAC
    
        China
        India
        Japan
        South Korea
    
    
      Middle East and Africa
    
    
    
      South America
    

    By Type Insights

    The application software segment is estimated to witness significant growth during the forecast period. In the contemporary business landscape, application software plays a pivotal role in driving efficiency and productivity across various industries. These software solutions cater to diverse functionalities, encompassing productivity, business management, entertainment, and communication. Notably, data protection and network security have emerged as critical areas of focus, given the increasing prevalence of e-commerce and the Internet of Things (IoT). Software applications are extensively employed in sectors such as finance, healthcare, education, retail, and others, to manage and manipulate data effectively. For instance, enterprise resource planning (ERP) and customer relationship management (CRM) systems enable businesses to manage employee and customer databases, ensuring data accuracy and security.

    Moreover, individual users can leverage application software like Microsoft Excel to manage and analyze large data volumes, thereby streamlining operations and enhancing decision-making capabilities. Artificial Intelligence (AI) and Machine Learning (ML) have gained significant traction in recent times, with software solutions integrating these technologies to offer advanced capabilities. For example, AI-powered cybersecurity tools provide vital network protection, while e-commerce platforms leverage AI for personalized customer experiences and predictive analytics. In summary, application software solutions continue to shape the business world by offering functionalities that cater to evolving industry needs. Data protection and network security are key areas of focus, with AI and ML integration adding advanced capabilities to software applications.

    Get a glance at the market share of various segments Request Free Sample

    The application software segment accounted for USD 343.00 billion in 2018 and showed a gradual increase during the forecast period.

    Regional Insights

    North America is estimated to contribute 48% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and

  17. f

    Future of Science: Detecting Cancer

    • stemfellowship.figshare.com
    png
    Updated Feb 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vic Li; Cecilia Shi; Michael Yang; Eva Zhang; Suzy Zhang; George Li (2017). Future of Science: Detecting Cancer [Dataset]. http://doi.org/10.6084/m9.figshare.4621006.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Feb 5, 2017
    Dataset provided by
    STEM Fellowship Big Data Challenge
    Authors
    Vic Li; Cecilia Shi; Michael Yang; Eva Zhang; Suzy Zhang; George Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using Excel Data Analysis Tools and BigML Machine Learning platform, we tested correlation between biopsy data for breast cancer and created a model which helps to distinguish between benign and malignant tumors. Data set of oncology patients were used to analyze links between 10 indicators collected by biopsy non- cancerous and cancerous tumours. Created model can be used as a future medical science tool and can be available to specially trained histology nurses in rural areas. Developed model that can be used to detect cancer on early stages is especially important in the view of the fact that detecting cancer at stage IV give patients of about 22% of survival rate 1.

  18. G

    Graph Data Integration Platform Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Graph Data Integration Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/graph-data-integration-platform-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Graph Data Integration Platform Market Outlook



    According to our latest research, the global graph data integration platform market size reached USD 2.1 billion in 2024, reflecting robust adoption across industries. The market is projected to grow at a CAGR of 18.4% from 2025 to 2033, reaching approximately USD 10.7 billion by 2033. This significant growth is fueled by the increasing need for advanced data management and analytics solutions that can handle complex, interconnected data across diverse organizational ecosystems. The rapid digital transformation and the proliferation of big data have further accelerated the demand for graph-based data integration platforms.




    The primary growth factor driving the graph data integration platform market is the exponential increase in data complexity and volume within enterprises. As organizations collect vast amounts of structured and unstructured data from multiple sources, traditional relational databases often struggle to efficiently process and analyze these data sets. Graph data integration platforms, with their ability to map, connect, and analyze relationships between data points, offer a more intuitive and scalable solution. This capability is particularly valuable in sectors such as BFSI, healthcare, and telecommunications, where real-time data insights and dynamic relationship mapping are crucial for decision-making and operational efficiency.




    Another significant driver is the growing emphasis on advanced analytics and artificial intelligence. Modern enterprises are increasingly leveraging AI and machine learning to extract actionable insights from their data. Graph data integration platforms enable the creation of knowledge graphs and support complex analytics, such as fraud detection, recommendation engines, and risk assessment. These platforms facilitate seamless integration of disparate data sources, enabling organizations to gain a holistic view of their operations and customers. As a result, investment in graph data integration solutions is rising, particularly among large enterprises seeking to enhance their analytics capabilities and maintain a competitive edge.




    The surge in regulatory requirements and compliance mandates across various industries also contributes to the expansion of the graph data integration platform market. Organizations are under increasing pressure to ensure data accuracy, lineage, and transparency, especially in highly regulated sectors like finance and healthcare. Graph-based platforms excel in tracking data provenance and relationships, making it easier for companies to comply with regulations such as GDPR, HIPAA, and others. Additionally, the shift towards hybrid and multi-cloud environments further underscores the need for robust data integration tools capable of operating seamlessly across different infrastructures, further boosting market growth.




    From a regional perspective, North America currently dominates the graph data integration platform market, accounting for the largest share due to early adoption of advanced data technologies, a strong presence of key market players, and significant investments in digital transformation initiatives. However, Asia Pacific is expected to witness the fastest growth over the forecast period, driven by rapid industrialization, expanding IT infrastructure, and increasing adoption of cloud-based solutions among enterprises in countries like China, India, and Japan. Europe also remains a significant contributor, supported by stringent data privacy regulations and a mature digital economy.





    Component Analysis



    The component segment of the graph data integration platform market is bifurcated into software and services. The software segment currently commands the largest market share, reflecting the critical role of robust graph database engines, visualization tools, and integration frameworks in managing and analyzing complex data relationships. These software solutions are designed to deliver high scalability, flexibility, and real-time proces

  19. D

    Columnar Database Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Columnar Database Market Research Report 2033 [Dataset]. https://dataintelo.com/report/columnar-database-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Columnar Database Market Outlook



    According to our latest research, the global Columnar Database market size reached USD 3.2 billion in 2024, reflecting a robust demand for high-performance data management solutions across various industries. The market is expected to grow at a CAGR of 13.1% from 2025 to 2033, reaching a forecasted value of USD 8.6 billion by 2033. This remarkable growth trajectory is primarily driven by the exponential increase in data volume, the surge in business intelligence and analytics applications, and the rapid digital transformation initiatives being adopted by enterprises worldwide.




    A significant growth factor for the columnar database market is the escalating need for real-time analytics and high-speed data processing. Organizations are increasingly leveraging big data and complex analytics to gain actionable insights and maintain a competitive edge. Traditional row-based databases often struggle with performance bottlenecks when handling large-scale analytical queries. In contrast, columnar databases excel in such environments by enabling faster data retrieval and optimized storage, making them a preferred choice for enterprises seeking to enhance their decision-making processes. The adoption of advanced analytics, artificial intelligence, and machine learning is further fueling the demand for columnar database solutions, as these technologies require rapid access to vast datasets and efficient query performance.




    Another critical driver is the widespread adoption of cloud computing and hybrid IT infrastructures. As businesses migrate their workloads to cloud environments, the flexibility, scalability, and cost-effectiveness of columnar databases become increasingly attractive. Cloud-based columnar database solutions offer seamless integration, real-time scalability, and robust disaster recovery capabilities, which are essential for modern enterprises operating in dynamic markets. Additionally, the proliferation of Software-as-a-Service (SaaS) applications and the growing reliance on data-driven business models are pushing organizations to invest in advanced database architectures that can handle the complexities of multi-tenant environments and massive concurrent queries, further accelerating market expansion.




    The surge in regulatory compliance requirements and data governance standards is also shaping the growth of the columnar database market. Industries such as BFSI, healthcare, and government are under increasing pressure to manage, store, and analyze sensitive data securely and efficiently. Columnar databases offer enhanced data compression, encryption, and auditing capabilities, making them ideal for organizations that must adhere to stringent regulatory frameworks like GDPR, HIPAA, and PCI DSS. As data privacy concerns and compliance mandates intensify globally, organizations are prioritizing investments in database technologies that not only deliver high performance but also ensure robust data security and governance, thereby fueling market growth.




    From a regional perspective, North America continues to lead the columnar database market, driven by the presence of major technology vendors, early adoption of innovative IT solutions, and the high concentration of data-centric industries. Europe follows closely, with significant investments in digital transformation and regulatory compliance initiatives. The Asia Pacific region is emerging as a high-growth market, propelled by rapid industrialization, expanding digital infrastructure, and increasing adoption of cloud-based services across sectors such as retail, BFSI, and healthcare. Latin America and the Middle East & Africa are also witnessing steady growth, albeit at a relatively slower pace, as enterprises in these regions gradually embrace digital transformation and data-driven business strategies.



    Component Analysis



    The columnar database market is segmented by component into software and services, each playing a pivotal role in the overall ecosystem. The software segment dominates the market, accounting for the largest revenue share in 2024. This dominance is attributed to the continuous advancements in database technologies, increasing demand for high-performance data processing, and the proliferation of data-intensive applications. Modern columnar database software solutions are designed to deliver exceptional query performance, scalability, and flexibility, enabling organizations to efficiently manage and analyze vast volumes of

  20. Massive Bank dataset ( 1 Million+ rows)

    • kaggle.com
    zip
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K S ABISHEK (2023). Massive Bank dataset ( 1 Million+ rows) [Dataset]. https://www.kaggle.com/datasets/ksabishek/massive-bank-dataset-1-million-rows
    Explore at:
    zip(32471013 bytes)Available download formats
    Dataset updated
    Feb 21, 2023
    Authors
    K S ABISHEK
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Greetings , fellow analysts !

    (NOTE : This is a random dataset generated using python. It bears no resemblance to any real entity in the corporate world. Any resemblance is a matter of coincidence.)

    REC-SSEC Bank is a govt-aided bank operating in the Indian Peninsula. They have regional branches in over 40+ regions of the country. You have been provided with a massive excel sheet containing the transaction details, the total transaction amount and their location and total transaction count.

    The dataset is described as follows :

    1. Date - The date on which the transaction took place. 2.Domain - Where or which type of Business entity made the transaction. 3.Location - Where the data is collected from 4.Value - Total value of transaction
    2. Count of transaction .

    For example , in the very first row , the data can be read as : " On the first of January, 2022 , 1932 transactions of summing upto INR 365554 from Bhuj were reported " NOTE : There are about 2750 transactions every single day. All of this has been given to you.

    The bank wants you to answer the following questions :

    1. What is the average transaction value everyday for each domain over the year.
    2. What is the average transaction value for every city/location over the year
    3. The bank CEO , Mr: Hariharan , wants to promote the ease of transaction for the highest active domain. If the domains could be sorted into a priority, what would be the priority list ?
    4. What's the average transaction count for each city ?
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Organization logo

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Related Article
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Search
Clear search
Close search
Google apps
Main menu