Facebook
TwitterThis dataset was created by Pinky Verma
Facebook
TwitterThe USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Facebook
TwitterDistribution of doses of a volatile organic compound from inhalation of one consumer product, other near -field sources, far-field sources, and aggregate (total) exposure. In this instance, far-field scenarios account for several orders of magnitude of less of the predicted dose compared to near-field scenarios. This dataset is associated with the following publication: Vallero, D. Air Pollution Monitoring Changes to Accompany the Transition from a Control to a Systems Focus. Sustainability. MDPI AG, Basel, SWITZERLAND, 8(12): 1216, (2016).
Facebook
TwitterThis dataset was created by Aziza Afrin
Facebook
TwitterThis dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.
🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components
🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added
📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows
📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data organization for the figures in the document: Figure 3A LineOutWithSun_SSAzi_135to225_green_Correct_ROI5_INFO.xls Figure 3b LineOutWithSun_SSAzi_m45to45_green_Correct_ROI5_INFO.xls Figure 4 fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Correct_ROI5_INFO.xls fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Sim_Correct_ROI5_INFO.xls Figure 5a LineOut_Camera_Elevation_SqAzi_m180to0_green_Sim_Correct_ROI5_INFO.xls LineOut_Camera_Elevation_SqAzi_m180to0_green_Correct_ROI5_INFO.xls Figure 5b LineOut_Camera_Elevation_SqAzi_0to180_green_Correct_ROI5_INFO.xls LineOut_Camera_Elevation_SqAzi_0to180_green_Sim_Correct_ROI5_INFO.xls Figure 6a LineOutColor_SqAzi_m180to0_CP_20to50_Correct_ROI5_INFO.xls Figure 6b LineOutROI_SqAzi_m180to0_CP_20to50_green_Correct_INFO.xls Figure 7 fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Correct_ROI5_INFO.xls LineOut_MeshAoPDif_Camera_Elevation_SqAzi_0to180_green_Correct_ROI5_INFO.xls LineOut_MeshAoPDif_Camera_Elevation_SqAzi_m180to0_green_Correct_ROI5_INFO.xls
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Facebook
TwitterExcel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.
Key observations
The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains one Excel sheet and five Word documents. In this dataset, Simulation.xlsx describes the parameter values used for the numerical analysis based on empirical data. In this Excel sheet, we calculated the values of each capped call-option model parameter. Computation of Table 2.docx and other documents show the results of the comparative statistics.
Facebook
TwitterThe Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
The Superstore Sales Data dataset, available in an Excel format as "Superstore.xlsx," is a comprehensive collection of sales and customer-related information from a retail superstore. This dataset comprises* three distinct tables*, each providing specific insights into the store's operations and customer interactions.
Facebook
TwitterThis dataset was created by Mohammed Azarudheen
Released under Other (specified in description)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Facebook
TwitterThe annual Retail store data CD-ROM is an easy-to-use tool for quickly discovering retail trade patterns and trends. The current product presents results from the 1999 and 2000 Annual Retail Store and Annual Retail Chain surveys. This product contains numerous cross-classified data tables using the North American Industry Classification System (NAICS). The data tables provide access to a wide range of financial variables, such as revenues, expenses, inventory, sales per square footage (chain stores only) and the number of stores. Most data tables contain detailed information on industry (as low as 5-digit NAICS codes), geography (Canada, provinces and territories) and store type (chains, independents, franchises). The electronic product also contains survey metadata, questionnaires, information on industry codes and definitions, and the list of retail chain store respondents.
Facebook
TwitterDCR 2023 Supplemental Excel Files
Facebook
TwitterThis page provides guidance on linking open data to a spreadsheet.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Excel sheet with data of the original research 'Evaluation of simple and cost-effective hematological inflammatory biomarkers in type 2 diabetes and their correlation with glycemic control'
Facebook
TwitterCanaan Valley NWR forest inventory factory database used to generate statistics and summary Excel-based reports
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterThis dataset was created by Pinky Verma