98 datasets found
  1. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset: Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided). Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  2. Dirty Excel Data

    • kaggle.com
    zip
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiva Vashishtha (2022). Dirty Excel Data [Dataset]. https://www.kaggle.com/datasets/shivavashishtha/dirty-excel-data
    Explore at:
    zip(13123 bytes)Available download formats
    Dataset updated
    Feb 23, 2022
    Authors
    Shiva Vashishtha
    Description

    Dataset

    This dataset was created by Shiva Vashishtha

    Contents

  3. Sales data analysis using MS Excel

    • kaggle.com
    zip
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yerzat Tursunkulov (2024). Sales data analysis using MS Excel [Dataset]. https://www.kaggle.com/datasets/yerzattursunkulov/sales-data-analysis-using-ms-excel
    Explore at:
    zip(31983063 bytes)Available download formats
    Dataset updated
    May 8, 2024
    Authors
    Yerzat Tursunkulov
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The Orders database contains information on the following variables. • Continuous variables: Row ID, Order ID, Order Date, Ship Date, Customer ID, Product ID, Sales, Quantity, Discount, Profit, Shipping Cost
    • Categorical variables: Ship Mode, Customer Name, Segment, Postal Code, City, State, Country, Region, Market, Category, Subcategory, Product Name, Order Priority

    The purpose of this project: 1. To use descriptive statistics methods to assess the sales performance across various segments, markets, product categories and subcategories; 2. To use diagnostic analytics methods to understand the statistical significance of the factors that influence sales; 3. Use predictive analytics (regression) to understand the strengths of the relationship between sales and sales drivers and generate a regression formula to predict sales 4. develop a sales forecasting model based on the insights.

    Descriptive analytics Descriptive statistics for sales https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F848f47b38b7f2360163bb2221703c658%2FPicture2.png?generation=1715109635788424&alt=media" alt="">

    Frequency distribution for sales Around 44,500 transactions of value >=USD 500. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F39cfd8ffd8fdf296300bb9f1fa5243e2%2FPicture3.png?generation=1715109667755923&alt=media" alt="">

    Sales values across markets https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F3385959d11b6daafae24c848b4b00f13%2FPicture4.png?generation=1715109744629587&alt=media" alt="">

    We see an increase in sales across all markets and throughout 2012-2015. We have high sales volumes in the USCA and LATAM markets:
    • USCA: USD 757,108 in 2015; • LATAM: USD 706,632 in 2015.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4aa59b5a5b980aad6873c8a4af4cd223%2FPicture1.png?generation=1715109770510368&alt=media" alt="">

    Sales across product categories https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F867cbe622bf94d25a25a1c4b9281656d%2FPicture5.png?generation=1715109794950614&alt=media" alt="">

    Office supplies were the largely sold product category in 2012-2015. Technology was the least sold product category by quantity. However, the Technology category yields high sales. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F5c74664f77cce2bc2f7c77c7b01e9890%2FPicture6.png?generation=1715109834309500&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd3bb766183e9f58fbf009a998c01adf6%2FPicture7.png?generation=1715109872961254&alt=media" alt="">

    Further analysis of profitable products reveals that phones and copiers demonstrate high sales. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F109c4c3eab81fa581c19a5c09beff839%2FPicture9.png?generation=1715109914590660&alt=media" alt="">

    Sales across segments The data reveals that there are high sales in the Consumer segment across all product categories. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F65075cc20028a37a1aff6932fa89d3d5%2FPicture10.png?generation=1715109992655572&alt=media" alt="">

    Diagnostic analytics

    Two sample T-test Using a t-test, we can evaluate how sales differ across different segments, regions, and product types. T-test allows us to evaluate the statistical significance of sales samples. The two-sample t-test of sales numbers across markets resulted in the statistical significance of sales in USCA and LATAM markets with p-values >0.05. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F7b7264d5f44a9a79b352028b28d1c618%2FPicture11.png?generation=1715110082746375&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4061ef38ea83d7e3bbd252a802863e8f%2FPicture12.png?generation=1715110097203251&alt=media" alt="">

    The two-sample t-test of sales numbers across product categories resulted in the statistical significance of sales in Office supplies and Technology categories with p-values >0.05. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd9994377d605222d77ef67af3e273771%2FPicture13.png?generation=1715110126112322&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F669779e9aad19d51a28fb44e7c484bc7%2FPicture14.png?generation=1715110140543290&alt=media" alt="">

    Pearson correlation The correlation of continuous values in the dataset allows us to see the relationship between sales, quantity sold, shipping costs and profit. ![](https://www.googleapis.com/download/sto...

  4. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  5. Power BI Sample Data

    • kaggle.com
    zip
    Updated Oct 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shwetank Chaudhary (2022). Power BI Sample Data [Dataset]. https://www.kaggle.com/datasets/shwetankchaudhary/power-bi-sample-data
    Explore at:
    zip(73587 bytes)Available download formats
    Dataset updated
    Oct 20, 2022
    Authors
    Shwetank Chaudhary
    Description

    This a dataset of finances which are also available in Power BI for practice. Use this dataset to practice Power BI.

  6. Global Population Data by Country in a Spreadsheet

    • rowzero.com
    Updated Mar 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Row Zero (2026). Global Population Data by Country in a Spreadsheet [Dataset]. https://rowzero.com/datasets/global-population-by-country
    Explore at:
    Dataset updated
    Mar 2, 2026
    Dataset provided by
    Row Zero, Inc.
    Description

    Spreadsheet of world population by country and global demographic data from 1950 to 2100. Lookup global population statistics for free in a big spreadsheet.

  7. N

    Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

    • neilsberg.com
    csv, json
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa8c95e0-4983-11ef-ae5d-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Excel
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

    Key observations

    The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Excel is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here

  8. Healthcare Dataset

    • kaggle.com
    zip
    Updated May 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). Healthcare Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/healthcare-dataset
    Explore at:
    zip(3054550 bytes)Available download formats
    Dataset updated
    May 8, 2024
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context:

    This synthetic healthcare dataset has been created to serve as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practice, develop, and showcase their data manipulation and analysis skills in the context of the healthcare industry.

    Inspiration:

    The inspiration behind this dataset is rooted in the need for practical and diverse healthcare data for educational and research purposes. Healthcare data is often sensitive and subject to privacy regulations, making it challenging to access for learning and experimentation. To address this gap, I have leveraged Python's Faker library to generate a dataset that mirrors the structure and attributes commonly found in healthcare records. By providing this synthetic data, I hope to foster innovation, learning, and knowledge sharing in the healthcare analytics domain.

    Dataset Information:

    Each column provides specific information about the patient, their admission, and the healthcare services provided, making this dataset suitable for various data analysis and modeling tasks in the healthcare domain. Here's a brief explanation of each column in the dataset - - Name: This column represents the name of the patient associated with the healthcare record. - Age: The age of the patient at the time of admission, expressed in years. - Gender: Indicates the gender of the patient, either "Male" or "Female." - Blood Type: The patient's blood type, which can be one of the common blood types (e.g., "A+", "O-", etc.). - Medical Condition: This column specifies the primary medical condition or diagnosis associated with the patient, such as "Diabetes," "Hypertension," "Asthma," and more. - Date of Admission: The date on which the patient was admitted to the healthcare facility. - Doctor: The name of the doctor responsible for the patient's care during their admission. - Hospital: Identifies the healthcare facility or hospital where the patient was admitted. - Insurance Provider: This column indicates the patient's insurance provider, which can be one of several options, including "Aetna," "Blue Cross," "Cigna," "UnitedHealthcare," and "Medicare." - Billing Amount: The amount of money billed for the patient's healthcare services during their admission. This is expressed as a floating-point number. - Room Number: The room number where the patient was accommodated during their admission. - Admission Type: Specifies the type of admission, which can be "Emergency," "Elective," or "Urgent," reflecting the circumstances of the admission. - Discharge Date: The date on which the patient was discharged from the healthcare facility, based on the admission date and a random number of days within a realistic range. - Medication: Identifies a medication prescribed or administered to the patient during their admission. Examples include "Aspirin," "Ibuprofen," "Penicillin," "Paracetamol," and "Lipitor." - Test Results: Describes the results of a medical test conducted during the patient's admission. Possible values include "Normal," "Abnormal," or "Inconclusive," indicating the outcome of the test.

    Usage Scenarios:

    This dataset can be utilized for a wide range of purposes, including: - Developing and testing healthcare predictive models. - Practicing data cleaning, transformation, and analysis techniques. - Creating data visualizations to gain insights into healthcare trends. - Learning and teaching data science and machine learning concepts in a healthcare context. - You can treat it as a Multi-Class Classification Problem and solve it for Test Results which contains 3 categories(Normal, Abnormal, and Inconclusive).

    Acknowledgments:

    • I acknowledge the importance of healthcare data privacy and security and emphasize that this dataset is entirely synthetic. It does not contain any real patient information or violate any privacy regulations.
    • I hope that this dataset contributes to the advancement of data science and healthcare analytics and inspires new ideas. Feel free to explore, analyze, and share your findings with the Kaggle community.

    Image Credit:

    Image by BC Y from Pixabay

  9. Large Truck Crash Causation Study (LTCCS) - File 1 (Excel)

    • data.virginia.gov
    • data.ur.virginia.gov
    • +12more
    xls
    Updated May 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S Department of Transportation (2024). Large Truck Crash Causation Study (LTCCS) - File 1 (Excel) [Dataset]. https://data.virginia.gov/dataset/large-truck-crash-causation-study-ltccs-file-1-excel
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 24, 2024
    Dataset provided by
    Federal Motor Carrier Safety Administrationhttps://www.fmcsa.dot.gov/
    Authors
    U.S Department of Transportation
    Description

    The Large Truck* Crash Causation Study (LTCCS) is based on a three-year data collection project conducted by the Federal Motor Carrier Safety Administration (FMCSA) and the National Highway Traffic Safety Administration (NHTSA) of the U.S. Department of Transportation (DOT). LTCCS is the first-ever national study to attempt to determine the critical events and associated factors that contribute to serious large truck crashes allowing DOT and others to implement effective countermeasures to reduce the occurrence and severity of these crashes.

  10. Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A...

    • figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather Cribbs; Gabriel Gardner (2023). Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A Multi-Campus Big Data Study" [Dataset]. http://doi.org/10.6084/m9.figshare.19071578.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Heather Cribbs; Gabriel Gardner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Five files, one of which is a ZIP archive, containing data that support the findings of this study. PDF file "IA screenshots CSU Libraries search config" contains screenshots captured from the Internet Archive's Wayback Machine for all 24 CalState libraries' homepages for years 2017 - 2019. Excel file "CCIHE2018-PublicDataFile" contains Carnegie Classifications data from the Indiana University Center for Postsecondary Research for all of the CalState campuses from 2018. CSV file "2017-2019_RAW" contains the raw data exported from Ex Libris Primo Analytics (OBIEE) for all 24 CalState libraries for calendar years 2017 - 2019. CSV file "clean_data" contains the cleaned data from Primo Analytics which was used for all subsequent analysis such as charting and import into SPSS for statistical testing. ZIP archive file "NonparametricStatisticalTestsFromSPSS" contains 23 SPSS files [.spv format] reporting the results of testing conducted in SPSS. This archive includes things such as normality check, descriptives, and Kruskal-Wallis H-test results.

  11. Hospital Excel Dataset

    • kaggle.com
    zip
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omolola Labiyi (2025). Hospital Excel Dataset [Dataset]. https://www.kaggle.com/datasets/t0ut0u/hospital-excel-dataset
    Explore at:
    zip(18209846 bytes)Available download formats
    Dataset updated
    Apr 17, 2025
    Authors
    Omolola Labiyi
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Healthcare administrators constantly face difficult questions about costs, patient volume, and resource allocation. Understanding how long patients stay, when admissions spike, and how treatment costs fluctuate can help hospitals plan staffing, negotiate insurance contracts, and manage operational efficiency. In this project, I analyzed hospital admissions and treatment cost data to identify patterns in patient stays, insurance coverage, and seasonal trends. The goal was to explore how hospitals could use data to better understand operational demand and financial performance. Using Microsoft Excel, I conducted exploratory analysis on hospital admission records that included patient demographics, hospital locations, insurance providers, treatment costs, and length of stay. Through PivotTables, formulas, and visualizations, I transformed raw data into insights that reveal how patient volume, insurance distribution, and treatment costs vary across hospitals and over time.

    Dataset Overview The dataset includes information on: • Patient admissions across multiple hospitals • Insurance providers and coverage distribution • Hospital stay durations • Treatment cost per day • Monthly admission trends These variables allowed for analysis of both operational hospital metrics and financial performance indicators.

    Analysis Approach To explore the dataset, I used Excel tools to summarize large volumes of hospital data and identify patterns. Techniques used included: • PivotTables to aggregate hospital admissions and insurance provider distribution • Conditional formatting to highlight cost changes across time periods • Bar, pie, and line charts to visualize operational trends • Calculations to measure average length of stay and daily treatment costs These tools allowed me to quickly transform transactional hospital data into meaningful visual insights.

    Key Findings Cost per Day Trends The average daily treatment cost across hospitals was $3,386.40. Costs fluctuated throughout the year, with the highest treatment costs occurring in September. This spike may reflect seasonal demand for medical procedures, insurance billing cycles, or higher treatment complexity. Following this peak, treatment costs declined during October and November, suggesting a potential normalization in hospital utilization or procedure volume.

    Average Length of Stay Across all hospitals, the average patient stay was approximately 16 days. Monthly variations were relatively small but still informative: • April recorded the longest stays, averaging about 15.75 days. • September recorded the shortest stays, averaging about 15.22 days. This distribution suggests that patient stay durations remain relatively stable across the year, though certain months may involve more complex cases or slower discharge cycles.

    Insurance Provider Distribution Insurance coverage varied significantly across the patient population. • Cigna covered the largest share of patients, accounting for approximately 20.27% of hospital admissions. • Aetna had the lowest patient share, suggesting smaller network presence or limited coverage in the hospitals represented in the dataset. Additionally, patient length of stay varied slightly by insurance provider. Patients covered by Medicare stayed an average of 15.63 days, while Aetna patients averaged 15.45 days. These differences may reflect variations in patient demographics, treatment complexity, or insurance policy structures.

    Seasonal Admission Patterns Admissions also followed a seasonal pattern. • August recorded the highest number of hospital admissions, potentially reflecting increased elective procedures or seasonal health conditions. • February had the lowest admission levels, suggesting lower procedural demand or fewer emergency cases. Understanding these seasonal trends could help hospitals better plan staffing levels and manage resource allocation during high-demand periods.

    Business Insights The analysis highlights several opportunities for hospital administrators and healthcare planners: • Rising treatment costs may require closer monitoring of hospital billing practices and insurance reimbursement structures. • Seasonal admission trends can help hospitals anticipate demand and allocate staff more effectively. • Insurance provider distribution may influence strategic partnerships between hospitals and insurers. • Monitoring length-of-stay trends can help identify operational inefficiencies or opportunities to improve discharge planning. By leveraging simple analytical tools in Excel, hospital operations teams can uncover valuable insights that support more informed planning and decision-making.

    Skills Demonstrated • Data exploration and cleaning in Excel • PivotTable-based analysis • Trend analysis and statistical summaries • Data visualization using charts and dashboards • Healthcare operational data analysis • Translating raw data into actionable insights

  12. D

    Reconsidering ceramics and trade using big data: the significance of...

    • archaeology.datastations.nl
    ods, pdf, xml
    Updated Nov 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RMR van Oosten; RMR van Oosten (2017). Reconsidering ceramics and trade using big data: the significance of stoneware distribution in the Low Countries, 1200-1600 [Dataset]. http://doi.org/10.17026/DANS-ZRZ-VQ88
    Explore at:
    xml(2389), pdf(1282323), ods(128813), xml(4691), ods(20343), pdf(1330057), pdf(1329926), xml(5342), xml(4234), pdf(85631), pdf(1375165), ods(38162), pdf(1242198)Available download formats
    Dataset updated
    Nov 15, 2017
    Dataset provided by
    DANS Data Station Archaeology
    Authors
    RMR van Oosten; RMR van Oosten
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supplementary material, R.M.R. van Oosten (accepted 17-8-2019), 'Reconsidering ceramics and trade using big data: the significance of stoneware distribution in the Low Countries, 1200?1600' Medieval Ceramics . The Journal of the Medieval Pottery Research Group (MPRG)1) A bibliography of the publications used, ordered alphabetically per town (an 11-page PDF file); 2) The frequency of stoneware in seven archaeological years, 1250, 1300 in the 38 towns in the Low Countries and 3 towns outside the Low Countries (Excel file with one sheet) 3) The data itself, i.e., the various fabrics per assemblages ordered per town (Excel file with 42 sheets). Fabric codes from the Deventer systeem, Dutch classification system: P. Bitter, S. Ostkamp en N.L. Jaspers, Classificatiesysteem voor (post-)middeleeuws aardewerk en glas= Het Deventer Systeem (sinds 1989), April 2012, 700 pages, p. 4 & 8.

  13. S

    Annual Retail Store Data, 2000 [Canada] [Excel]

    • dataverse.scholarsportal.info
    • borealisdata.ca
    pdf, xls
    Updated Nov 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scholars Portal Dataverse (2021). Annual Retail Store Data, 2000 [Canada] [Excel] [Dataset]. https://dataverse.scholarsportal.info/dataset.xhtml;jsessionid=1283d69ee2dd528c9011fe4a2fe3?persistentId=hdl%3A10864%2F11351&version=&q=&fileTypeGroupFacet=&fileAccess=&fileTag=%22Tables%22&fileSortField=&fileSortOrder=
    Explore at:
    xls(2165760), xls(29696), xls(2920448), pdf(76787), pdf(158404), xls(34816), xls(2754048), pdf(81084), pdf(71183), xls(34304), xls(625664), xls(2707968), xls(695808), pdf(70673), pdf(72585), xls(576512), xls(609792), xls(28672), pdf(60236), pdf(30338), pdf(87181), pdf(84140), pdf(92012), xls(610304), pdf(74439), xls(2471424), pdf(73788), xls(30208), pdf(74478), pdf(53645)Available download formats
    Dataset updated
    Nov 17, 2021
    Dataset provided by
    Scholars Portal Dataverse
    Area covered
    Canada, Canada
    Description

    The annual Retail store data CD-ROM is an easy-to-use tool for quickly discovering retail trade patterns and trends. The current product presents results from the 1999 and 2000 Annual Retail Store and Annual Retail Chain surveys. This product contains numerous cross-classified data tables using the North American Industry Classification System (NAICS). The data tables provide access to a wide range of financial variables, such as revenues, expenses, inventory, sales per square footage (chain stores only) and the number of stores. Most data tables contain detailed information on industry (as low as 5-digit NAICS codes), geography (Canada, provinces and territories) and store type (chains, independents, franchises). The electronic product also contains survey metadata, questionnaires, information on industry codes and definitions, and the list of retail chain store respondents.

  14. m

    Data for: Development and Evaluation of Road Vehicle Emission Inventory for...

    • data.mendeley.com
    • narcis.nl
    Updated Dec 27, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mona Lisa Oliveira (2017). Data for: Development and Evaluation of Road Vehicle Emission Inventory for Fortaleza Metropolitan Area, Brazil [Dataset]. http://doi.org/10.17632/x8rcd7cpny.1
    Explore at:
    Dataset updated
    Dec 27, 2017
    Authors
    Mona Lisa Oliveira
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Fortaleza Metropolitan Area, Brazil
    Description

    Appendix B. Supplementary data

    All FE data is available in an excel file.

  15. n

    Data from: Designing data science workshops for data-intensive environmental...

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Dec 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison Theobold; Stacey Hancock; Sara Mannheimer (2020). Designing data science workshops for data-intensive environmental science research [Dataset]. http://doi.org/10.5061/dryad.7wm37pvp7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 8, 2020
    Dataset provided by
    California State Polytechnic University
    Montana State University
    Authors
    Allison Theobold; Stacey Hancock; Sara Mannheimer
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.

    Methods Surveys from Carpentries style workshops the results of which are presented in the accompanying manuscript.

    Pre- and post-workshop surveys for each workshop (Introduction to R, Intermediate R, Data Wrangling in R, Data Visualization in R) were collected via Google Form.

    The surveys administered for the fall 2018, spring 2019 academic year are included as pre_workshop_survey and post_workshop_assessment PDF files. 
    The raw versions of these data are included in the Excel files ending in survey_raw or assessment_raw.
    
      The data files whose name includes survey contain raw data from pre-workshop surveys and the data files whose name includes assessment contain raw data from the post-workshop assessment survey.
    
    
    The annotated RMarkdown files used to clean the pre-workshop surveys and post-workshop assessments are included as workshop_survey_cleaning and workshop_assessment_cleaning, respectively. 
    The cleaned pre- and post-workshop survey data are included in the Excel files ending in clean. 
    The summaries and visualizations presented in the manuscript are included in the analysis annotated RMarkdown file.
    
  16. t

    Data from: Data set for the population survey “attitudes towards big data...

    • service.tib.eu
    • radar-service.eu
    • +1more
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Data set for the population survey “attitudes towards big data practices and the institutional framework of privacy and data protection” [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1151
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Abstract: The aim of this study is to gain insights into the attitudes of the population towards big data practices and the factors influencing them. To this end, a nationwide survey (N = 1,331), representative of the population of Germany, addressed the attitudes about selected big data practices exemplified by four scenarios, which may have a direct impact on the personal lifestyle. The scenarios contained price discrimination in retail, credit scoring, differentiations in health insurance, and differentiations in employment. The attitudes about the scenarios were set into relation to demographic characteristics, personal value orientations, knowledge about computers and the internet, and general attitudes about privacy and data protection. Another focus of the study is on the institutional framework of privacy and data protection, because the realization of benefits or risks of big data practices for the population also depends on the knowledge about the rights the institutional framework provided to the population and the actual use of those rights. As results, several challenges for the framework by big data practices were confirmed, in particular for the elements of informed consent with privacy policies, purpose limitation, and the individuals’ rights to request information about the processing of personal data and to have these data corrected or erased. TechnicalRemarks: TYPE OF SURVEY AND METHODS The data set includes responses to a survey conducted by professionally trained interviewers of a social and market research company in the form of computer-aided telephone interviews (CATI) from 2017-02 to 2017-04. The target population was inhabitants of Germany aged 18 years and more, who were randomly selected by using the sampling approaches ADM eASYSAMPLe (based on the Gabler-Häder method) for landline connections and eASYMOBILe for mobile connections. The 1,331 completed questionnaires comprise 44.2 percent mobile and 55.8 percent landline phone respondents. Most questions had options to answer with a 5-point rating scale (Likert-like) anchored with ‘Fully agree’ to ‘Do not agree at all’, or ‘Very uncomfortable’ to ‘Very comfortable’, for instance. Responses by the interviewees were weighted to obtain a representation of the entire German population (variable ‘gewicht’ in the data sets). To this end, standard weighting procedures were applied to reduce differences between the sample and the entire population with regard to known rates of response and non-response depending on household size, age, gender, educational level, and place of residence. RELATED PUBLICATION AND FURTHER DETAILS The questionnaire, analysis and results will be published in the corresponding report (main text in English language, questionnaire in Appendix B in German language of the interviews and English translation). The report will be available as open access publication at KIT Scientific Publishing (https://www.ksp.kit.edu/). Reference: Orwat, Carsten; Schankin, Andrea (2018): Attitudes towards big data practices and the institutional framework of privacy and data protection - A population survey, KIT Scientific Report 7753, Karlsruhe: KIT Scientific Publishing. FILE FORMATS The data set of responses is saved for the repository KITopen at 2018-11 in the following file formats: comma-separated values (.csv), tapulator-separated values (.dat), Excel (.xlx), Excel 2007 or newer (.xlxs), and SPSS Statistics (.sav). The questionnaire is saved in the following file formats: comma-separated values (.csv), Excel (.xlx), Excel 2007 or newer (.xlxs), and Portable Document Format (.pdf). PROJECT AND FUNDING The survey is part of the project Assessing Big Data (ABIDA) (from 2015-03 to 2019-02), which receives funding from the Federal Ministry of Education and Research (BMBF), Germany (grant no. 01IS15016A-F). http://www.abida.de

  17. Data and CSV files

    • figshare.com
    txt
    Updated Jul 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emma Knol (2018). Data and CSV files [Dataset]. http://doi.org/10.6084/m9.figshare.6834788.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 18, 2018
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Emma Knol
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In here all csv files and excel sheets used for statistical analyses are stored.The Excel file 'Raw data' contains a description of each varable in the tab 'metadata'.The second tab contains all the obtained raw data.

  18. m

    Data for: A Prioritization-based Analysis of Open Data Portals: The Case...

    • data.mendeley.com
    Updated Oct 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Di Wang (2018). Data for: A Prioritization-based Analysis of Open Data Portals: The Case study of Chinese Local Governments [Dataset]. http://doi.org/10.17632/ykdbpdmspy.1
    Explore at:
    Dataset updated
    Oct 16, 2018
    Authors
    Di Wang
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Area covered
    China
    Description

    We have used Analytic Hierarchy Process (AHP) to derive the priorities of all the factors in the evaluation framework for open government data (OGD) portals. The results of AHP process were shown in the uploaded pdf file. We have collected 2635 open government datasets of 15 different subject categories (local statistics, health, education, cultural activity, transportation, map, public safety, policies and legislation, weather, environment quality, registration, credit records, international trade, budget and spend, and government bid) from 9 OGD portals in China (Beijing, Zhejiang, Shanghai, Guangdong, Guizhou, Sichuan, XInjiang, Hong Kong and Taiwan). These datasets were used for the evaluation of these portals in our study. The records of the quality and open access of these datasets could be found in the uploaded Excel file.

  19. o

    Evolution of IT system and TEM camera performance

    • explore.openaire.eu
    • zenodo.org
    • +1more
    Updated Dec 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dieter Weber (2018). Evolution of IT system and TEM camera performance [Dataset]. http://doi.org/10.5281/zenodo.3474955
    Explore at:
    Dataset updated
    Dec 20, 2018
    Authors
    Dieter Weber
    Description

    This Excel file collates throughput specifications of transmission electron microscopy (TEM) cameras, mass storage, network and memory between 1996 and 2018. The absolute and relative development of the throughput is analyzed in charts.

  20. m

    Power System Load Research Tool with Energy Dashboard

    • data.mendeley.com
    Updated Nov 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KS Sastry Musti (2022). Power System Load Research Tool with Energy Dashboard [Dataset]. http://doi.org/10.17632/k5wm3sw5k5.1
    Explore at:
    Dataset updated
    Nov 10, 2022
    Authors
    KS Sastry Musti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The main aim of load research is to study the load characteristics through a versatile dashboard that provides overall visualizations for large data sets collected from different regions, different types of consumers. Several tasks related in energy supply industry (ESI) such as generation planning, power dispatch, system operation and control, load shedding etc., require load information and visualizations for both historical and real-time data sets. To support users in carrying out such studies, a versatile load research tool with energy dashboard is developed, using Microsoft Excel platform. To stimulate insights into big data aspects, historical data over a few years is considered. Wide ranging features and possibilities of simulating various operating conditions and customizations have been incorporated into this tool. The tool is made available in the Mendeley data repository, which is a public domain for wider use. We do hope that users and ESI stakeholders find the tool useful. We welcome feedback and comments.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Organization logo

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Related Article
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset: Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided). Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Search
Clear search
Close search
Google apps
Main menu