91 datasets found

18 excel spreadsheets by species and year giving reproduction and growth...
catalog.data.gov
data.wu.ac.at
Updated Aug 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
Explore at:
Dataset updated
Aug 17, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
T
Excel files containing data for Figures
dataverse.tdl.org
xls
Updated Aug 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parrish Brady; Parrish Brady (2020). Excel files containing data for Figures [Dataset]. http://doi.org/10.18738/T8/EGV2TV
Explore at:
xls(22016), xls(71680), xls(9728), xls(13824), xls(529920), xls(339968), xls(26112), xls(17920), xls(67584)Available download formats
Unique identifier
https://doi.org/10.18738/T8/EGV2TV
Dataset updated
Aug 24, 2020
Dataset provided by
Texas Data Repository
Authors
Parrish Brady; Parrish Brady
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data organization for the figures in the document: Figure 3A LineOutWithSun_SSAzi_135to225_green_Correct_ROI5_INFO.xls Figure 3b LineOutWithSun_SSAzi_m45to45_green_Correct_ROI5_INFO.xls Figure 4 fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Correct_ROI5_INFO.xls fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Sim_Correct_ROI5_INFO.xls Figure 5a LineOut_Camera_Elevation_SqAzi_m180to0_green_Sim_Correct_ROI5_INFO.xls LineOut_Camera_Elevation_SqAzi_m180to0_green_Correct_ROI5_INFO.xls Figure 5b LineOut_Camera_Elevation_SqAzi_0to180_green_Correct_ROI5_INFO.xls LineOut_Camera_Elevation_SqAzi_0to180_green_Sim_Correct_ROI5_INFO.xls Figure 6a LineOutColor_SqAzi_m180to0_CP_20to50_Correct_ROI5_INFO.xls Figure 6b LineOutROI_SqAzi_m180to0_CP_20to50_green_Correct_INFO.xls Figure 7 fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Correct_ROI5_INFO.xls LineOut_MeshAoPDif_Camera_Elevation_SqAzi_0to180_green_Correct_ROI5_INFO.xls LineOut_MeshAoPDif_Camera_Elevation_SqAzi_m180to0_green_Correct_ROI5_INFO.xls
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
g
Employee Travel 2021 (Excel)
opendata.greatersudbury.ca
arc-gis-hub-home-arcgishub.hub.arcgis.com
Updated Sep 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Greater Sudbury (2021). Employee Travel 2021 (Excel) [Dataset]. https://opendata.greatersudbury.ca/documents/7d73d365118b46e4828f52fea7c8ce3a
Explore at:
Dataset updated
Sep 1, 2021
Dataset authored and provided by
City of Greater Sudbury
Description
Download Employee Travel Excel SheetThis dataset contains information about the employee travel expenses for the year 2021. Details are provided on the employee (name, title, department), the travel (dates, location, purpose) and the cost (expenses, recoveries). Expenses are broken down in separate tabs by Quarter (Q1, Q2, Q3 and Q4). Updated quarterly when expenses are prepared. Expenses for other years are available in separate datasets.
Sample Student Data
figshare.com
xls
Updated Aug 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carrie Ellis (2022). Sample Student Data [Dataset]. http://doi.org/10.6084/m9.figshare.20419434.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20419434.v1
Dataset updated
Aug 2, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Carrie Ellis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In "Sample Student Data", there are 6 sheets. There are three sheets with sample datasets, one for each of the three different exercise protocols described (CrP Sample Dataset, Glycolytic Dataset, Oxidative Dataset). Additionally, there are three sheets with sample graphs created using one of the three datasets (CrP Sample Graph, Glycolytic Graph, Oxidative Graph). Each dataset and graph pairs are from different subjects. · CrP Sample Dataset and CrP Sample Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the creatine phosphate system. Here, the subject was a track and field athlete who threw the shot put for the DeSales University track team. The NIRS monitor was placed on the right triceps muscle, and the student threw the shot put six times with a minute rest in between throws. Data was collected telemetrically by the NIRS device and then downloaded after the student had completed the protocol. · Glycolytic Dataset and Glycolytic Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the glycolytic energy system. In this example, the subject performed continuous squat jumps for 30 seconds, followed by a 90 second rest period, for a total of three exercise bouts. The NIRS monitor was place on the left gastrocnemius muscle. Here again, data was collected telemetrically by the NIRS device and then downloaded after he had completed the protocol. · Oxidative Dataset and Oxidative Graph: In this example, the dataset and graph are from an exercise protocol designed to stress the oxidative system. Here, the student held a sustained, light-intensity, isometric biceps contraction (pushing against a table). The NIRS monitor was attached to the left biceps muscle belly. Here, data was collected by a student observing the SmO2 values displayed on a secondary device; specifically, a smartphone with the IPSensorMan APP displaying data. The recorder student observed and recorded the data on an Excel Spreadsheet, and marked the times that exercise began and ended on the Spreadsheet.
o
Messy data for data cleaning exercise - Dataset - openAFRICA
open.africa
Updated Oct 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise
Explore at:
Dataset updated
Oct 6, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions
N
Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...
neilsberg.com
csv, json
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa8c95e0-4983-11ef-ae5d-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 24, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Excel
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

Key observations

The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the Excel is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
e
Companies House - Free Company Data Product
data.europa.eu
html
Updated Sep 24, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
London Borough of Barnet (2021). Companies House - Free Company Data Product [Dataset]. https://data.europa.eu/data/datasets/companies-house-free-company-data-product
Explore at:
htmlAvailable download formats
Dataset updated
Sep 24, 2021
Dataset authored and provided by
London Borough of Barnet
Description

Provided by Companies House - London and Barnet data can be extracted

What is it?

The Free Company Data Product is a downloadable data snapshot containing basic company data of live companies on the register. This snapshot is provided as ZIP files containing data in CSV format and is split into multiple files for ease of downloading.

This snapshot is provided free of charge and will not be supported.

When will it be updated?

The latest snapshot will be updated within 5 working days of the previous month end.

Additional Information

The contents of the snapshot have been compiled up to the end of the previous month.

A list of the data fields contained in the snapshot can be found here PDF.

Up-to-date company information can be obtained by following the URI links in the data. More details on URIs

If files are viewed with Microsoft Excel, it is recommended that you use version 2007 or later.

Company Data Product FAQs
d
Financial Statement Data Sets
catalog.data.gov
s.cnmilf.com
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Economic and Risk Analysis (2025). Financial Statement Data Sets [Dataset]. https://catalog.data.gov/dataset/financial-statement-data-sets
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
Economic and Risk Analysis
Description
The data sets below provide selected information extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).
Sales and workload in retail industry
kaggle.com
Updated Dec 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dennis Gluesenkamp (2019). Sales and workload in retail industry [Dataset]. https://www.kaggle.com/dgluesen/sales-and-workload-data-from-retail-industry/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 12, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dennis Gluesenkamp
Description
Context

Raw data of real analytical use cases in a number of industries and companies is frequently provided in an Excel-based form. These files usually cannot be processed directly in machine learning models, but must first be cleaned and preprocessed. In this procedure, many different types of pitfalls may occur. This makes data preprocessing an essential time factor in the daily work of a data scientist.

Here, an Excel spreadsheet will be presented which in this form is closely oriented to a real case but contains only simulated figures for reasons of data and business results protection. The form and structure of the file correspond to a real case and could be encountered by a data scientist in a company in this way. Such a file can be the result of a download from a financial controlling system, e.g. SAP.

Content

The data includes information about sold goods resp. product units, the associated turnover and hours worked. This information is grouped by month, store and department of the retailer. Moreover, information about the sales area in a specific department as well as about the opening hours of the store is provided.

Possible objectives

The following goals of data cleansing might be addressed:

Import the Excel-file

Inspect the dataset

Check data types and do meaningful modifications

Handle missings/data gaps

Find and solve data inconsistencies

Rename columns for improved usage

Join tables to a single one

Furthermore, the data can be investigated with regard to correlations between different features and/or a regression model.

License

GNU General Public License v3.0 - https://www.gnu.org/licenses/gpl-3.0.en.html
d
Warehouse and Retail Sales
catalog.data.gov
data.montgomerycountymd.gov
+2more
Updated Sep 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.montgomerycountymd.gov (2025). Warehouse and Retail Sales [Dataset]. https://catalog.data.gov/dataset/warehouse-and-retail-sales
Explore at:
Dataset updated
Sep 7, 2025
Dataset provided by
data.montgomerycountymd.gov
Description
This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly
Enterprise Survey 2009-2019, Panel Data - Slovenia
microdata.worldbank.org
catalog.ihsn.org
Updated Aug 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
Explore at:
Dataset updated
Aug 6, 2020
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
European Bank for Reconstruction and Developmenthttp://ebrd.com/
European Investment Bankhttp://eib.org/
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
Dataset for numerical analysis
figshare.com
data.mendeley.com
zip
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shi Chen; Dong Chen; Jyh-Horng Lin (2023). Dataset for numerical analysis [Dataset]. http://doi.org/10.6084/m9.figshare.24648945.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24648945.v1
Dataset updated
Nov 28, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Shi Chen; Dong Chen; Jyh-Horng Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains one Excel sheet and five Word documents. In this dataset, Simulation.xlsx describes the parameter values used for the numerical analysis based on empirical data. In this Excel sheet, we calculated the values of each capped call-option model parameter. Computation of Table 2.docx and other documents show the results of the comparative statistics.
f
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
d
An example data set for exploration of Multiple Linear Regression
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. https://catalog.data.gov/dataset/an-example-data-set-for-exploration-of-multiple-linear-regression
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.
E
Data from: Facebook Data for Sentiment Analysis
live.european-language-grid.eu
binary format
Updated Jul 16, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Facebook Data for Sentiment Analysis [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1057
Explore at:
binary formatAvailable download formats
Dataset updated
Jul 16, 2013
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Corpus consisting of 10,000 Facebook posts manually annotated on sentiment (2,587 positive, 5,174 neutral, 1,991 negative and 248 bipolar posts). The archive contains data and statistics in an Excel file (FBData.xlsx) and gold data in two text files with posts (gold-posts.txt) and labels (gols-labels.txt) on corresponding lines.
d
GP Practice Prescribing Presentation-level Data - July 2014
digital.nhs.uk
csv, zip
Updated Oct 31, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). GP Practice Prescribing Presentation-level Data - July 2014 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/practice-level-prescribing-data
Explore at:
csv(1.4 GB), zip(257.7 MB), csv(1.7 MB), csv(275.8 kB)Available download formats
Dataset updated
Oct 31, 2014
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Jul 1, 2014 - Jul 31, 2014
Area covered
United Kingdom
Description
Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.
m
Diabetes Dataset
data.mendeley.com
Updated Jul 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahlam Rashid (2020). Diabetes Dataset [Dataset]. http://doi.org/10.17632/wj9rwkp9c2.1
Explore at:
Unique identifier
https://doi.org/10.17632/wj9rwkp9c2.1
Dataset updated
Jul 18, 2020
Authors
Ahlam Rashid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The construction of diabetes dataset was explained. The data were collected from the Iraqi society, as they data were acquired from the laboratory of Medical City Hospital and (the Specializes Center for Endocrinology and Diabetes-Al-Kindy Teaching Hospital). Patients' files were taken and data extracted from them and entered in to the database to construct the diabetes dataset. The data consist of medical information, laboratory analysis. The data attribute are: The data consist of medical information, laboratory analysis… etc. The data that have been entered initially into the system are: No. of Patient, Sugar Level Blood, Age, Gender, Creatinine ratio(Cr), Body Mass Index (BMI), Urea, Cholesterol (Chol), Fasting lipid profile, including total, LDL, VLDL, Triglycerides(TG) and HDL Cholesterol , HBA1C, Class (the patient's diabetes disease class may be Diabetic, Non-Diabetic, or Predict-Diabetic).
l
Datasets related to cross-sectional survey
opal.latrobe.edu.au
researchdata.edu.au
bin
Updated May 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruth Hardman; Evelien Spelten; Stephen Begg (2024). Datasets related to cross-sectional survey [Dataset]. http://doi.org/10.26181/615cdea12551c
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26181/615cdea12551c
Dataset updated
May 13, 2024
Dataset provided by
La Trobe
Authors
Ruth Hardman; Evelien Spelten; Stephen Begg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets generated from a survey undertaken in 2020. Participants were adults with one or more chronic health conditions. The survey consisted of a series of already validated self-report questionnaires assessing different aspects of patient capacity, and treatment burden related to the management of chronic health conditions.
Walmart Dataset
kaggle.com
Updated Dec 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2021). Walmart Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/walmart-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">

Description:

One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.

Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.

Acknowledgements

The dataset is taken from Kaggle.

Objective:

Understand the Dataset & cleanup (if required).

Build Regression models to predict the sales w.r.t single & multiple features.

Also evaluate the models & compare their respective scores like R2, RMSE, etc.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp

18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry.

Explore at:

Dataset updated

Aug 17, 2024

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).

Clear search

Close search

Google apps

Main menu

18 excel spreadsheets by species and year giving reproduction and growth...

Excel files containing data for Figures

Data Cleaning Sample

Employee Travel 2021 (Excel)

Sample Student Data

Messy data for data cleaning exercise - Dataset - openAFRICA

Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

About this dataset

Content

Inspiration

Recommended for further research

Companies House - Free Company Data Product

Provided by Companies House - London and Barnet data can be extracted

What is it?

When will it be updated?

Additional Information

Financial Statement Data Sets

Sales and workload in retail industry

Context

Content

Possible objectives

License

Warehouse and Retail Sales

Enterprise Survey 2009-2019, Panel Data - Slovenia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Dataset for numerical analysis

Orange dataset table

An example data set for exploration of Multiple Linear Regression

Data from: Facebook Data for Sentiment Analysis

GP Practice Prescribing Presentation-level Data - July 2014

Diabetes Dataset

Datasets related to cross-sectional survey

Walmart Dataset

Description:

Acknowledgements

Objective:

18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry.See More Versions

18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry.