63 datasets found

Random Test Data
figshare.com
bin
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Proell (2016). Random Test Data [Dataset]. http://doi.org/10.6084/m9.figshare.1096255.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1096255.v2
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Stefan Proell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a test
Microsoft excel database containing all the simulated (10 sets) and...
plos.figshare.com
figshare.com
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamed Ahmadi (2023). Microsoft excel database containing all the simulated (10 sets) and experimental data used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0187292.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0187292.s001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Hamed Ahmadi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel sheets in order: The sheet entitled “Hens Original Data” contains the results of an experiment conducted to study the response of laying hens during initial phase of egg production subjected to different intakes of dietary threonine. The sheet entitled “Simulated data & fitting values” contains the 10 simulated data sets that were generated using a standard procedure of random number generator. The predicted values obtained by the new three-parameter and conventional four-parameter logistic models were also appeared in this sheet. (XLSX)
Enterprise Survey 2009-2019, Panel Data - Slovenia
microdata.worldbank.org
catalog.ihsn.org
Updated Aug 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
Explore at:
Dataset updated
Aug 6, 2020
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
European Investment Bankhttp://eib.org/
European Bank for Reconstruction and Developmenthttp://ebrd.com/
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
excel sample data
kaggle.com
zip
Updated Dec 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aziza Afrin (2022). excel sample data [Dataset]. https://www.kaggle.com/datasets/azizaafrin/excel-sample-data
Explore at:
zip(5046 bytes)Available download formats
Dataset updated
Dec 14, 2022
Authors
Aziza Afrin
Description
Dataset

This dataset was created by Aziza Afrin

Contents
New 1000 Sales Records Data 2
kaggle.com
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
Explore at:
zip(49305 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
Calvin Oko Mensah
Description
This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
d
Population education level statistics for people aged 15 and over
data.gov.tw
xml
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dept. of Statistics (2024). Population education level statistics for people aged 15 and over [Dataset]. https://data.gov.tw/en/datasets/18546
Explore at:
xmlAvailable download formats
Dataset updated
Jul 11, 2024
Dataset authored and provided by
Dept. of Statistics
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Statistical Area Level of Education of Population Aged 15 and Over_ Secondary Release Area, Statistical Area Level of Education of Population Aged 15 and Over_ Primary Release Area, Statistical Area Level of Education of Population Aged 15 and Over_ Minimum Statistical AreaThe Ministry of the Interior's Statistics Department provides the latest annual statistical data for various counties and cities on the Government Open Data Platform in XML format. When viewed in a browser, it appears as a series of characters and numbers. Typically, this format is suitable for programmers to develop applications using the data, rather than being random characters. If you wish to download the data in CSV format (which can be viewed in Excel), please refer to the Social Economic Data Service Platform on the Land Information System website (segis.moi.gov.tw) for downloading.
Data from: Excel Templates: A Helpful Tool for Teaching Statistics
tandf.figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alejandro Quintela-del-Río; Mario Francisco-Fernández (2023). Excel Templates: A Helpful Tool for Teaching Statistics [Dataset]. http://doi.org/10.6084/m9.figshare.3408052.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3408052.v2
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Alejandro Quintela-del-Río; Mario Francisco-Fernández
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This article describes a free, open-source collection of templates for the popular Excel (2013, and later versions) spreadsheet program. These templates are spreadsheet files that allow easy and intuitive learning and the implementation of practical examples concerning descriptive statistics, random variables, confidence intervals, and hypothesis testing. Although they are designed to be used with Excel, they can also be employed with other free spreadsheet programs (changing some particular formulas). Moreover, we exploit some possibilities of the ActiveX controls of the Excel Developer Menu to perform interactive Gaussian density charts. Finally, it is important to note that they can be often embedded in a web page, so it is not necessary to employ Excel software for their use. These templates have been designed as a useful tool to teach basic statistics and to carry out data analysis even when the students are not familiar with Excel. Additionally, they can be used as a complement to other analytical software packages. They aim to assist students in learning statistics, within an intuitive working environment. Supplementary materials with the Excel templates are available online.
Random Data Analysis and Linear Regression
kaggle.com
zip
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tommaso Ruzza (2025). Random Data Analysis and Linear Regression [Dataset]. https://www.kaggle.com/datasets/tommasoruzza/random-data-analysis-and-linear-regression
Explore at:
zip(82141 bytes)Available download formats
Dataset updated
Nov 5, 2025
Authors
Tommaso Ruzza
Description
Created a multi-tab Excel statistical project where I generated synthetic normally-distributed data, built random sample extraction logic, calculated descriptive and inferential statistics, analysed variable correlations and performed linear regression with visualisation.
d
City of Tempe 2022 Community Survey Data
catalog.data.gov
performance.tempe.gov
+10more
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2022 Community Survey Data [Dataset]. https://catalog.data.gov/dataset/city-of-tempe-2022-community-survey-data
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
City of Tempe
Area covered
Tempe
Description
Description and PurposeThese data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2022):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethodsThe survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and LimitationsThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. This data is the weighted data provided by the ETC Institute, which is used in the final published PDF report.The 2022 Annual Community Survey report is available on data.tempe.gov. The individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Wydale HolmesContact E-Mail (author): wydale_holmes@tempe.govContact (maintainer): Wydale HolmesContact E-Mail (maintainer): wydale_holmes@tempe.govData Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
s
Data from: Fostering cultures of open qualitative research: Dataset 1 –...
orda.shef.ac.uk
docx
Updated Oct 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Hanchard; Itzel San Roman Pineda (2025). Fostering cultures of open qualitative research: Dataset 1 – Survey Responses [Dataset]. http://doi.org/10.15131/shef.data.23567250.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.23567250.v1
Dataset updated
Oct 8, 2025
Dataset provided by
The University of Sheffield
Authors
Matthew Hanchard; Itzel San Roman Pineda
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute.

The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

The project was funded with £13,913.85 Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021.This includes due concern for participant anonymity and data management.

ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license.

This dataset comprises one spreadsheet with N=91 anonymised survey responses .xslx format. It includes all responses to the project survey which used Google Forms between 06-Feb-2023 and 30-May-2023. The spreadsheet can be opened with Microsoft Excel, Google Sheet, or open-source equivalents.

The survey responses include a random sample of researchers worldwide undertaking qualitative, mixed-methods, or multi-modal research.

The recruitment of respondents was initially purposive, aiming to gather responses from qualitative researchers at research-intensive (targetted Russell Group) Universities. This involved speculative emails and a call for participant on the University of Sheffield ‘Qualitative Open Research Network’ mailing list. As result, the responses include a snowball sample of scholars from elsewhere.

The spreadsheet has two tabs/sheets: one labelled ‘SurveyResponses’ contains the anonymised and tidied set of survey responses; the other, labelled ‘VariableMapping’, sets out each field/column in the ‘SurveyResponses’ tab/sheet against the original survey questions and responses it relates to.

The survey responses tab/sheet includes a field/column labelled ‘RespondentID’ (using randomly generated 16-digit alphanumeric keys) which can be used to connect survey responses to interview participants in the accompanying ‘Fostering cultures of open qualitative research: Dataset 2 – Interview transcripts’ files.

A set of survey questions gathering eligibility criteria detail and consent are not listed with in this dataset, as below. All responses provide in the dataset gained a ‘Yes’ response to all the below questions (with the exception of one question, marked with an asterisk (*) below):

· I am aged 18 or over · I have read the information and consent statement and above. · I understand how to ask questions and/or raise a query or concern about the survey. · I agree to take part in the research and for my responses to be part of an open access dataset. These will be anonymised unless I specifically ask to be named. · I understand that my participation does not create a legally binding agreement or employment relationship with the University of Sheffield · I understand that I can withdraw from the research at any time. · I assign the copyright I hold in materials generated as part of this project to The University of Sheffield. · * I am happy to be contacted after the survey to take part in an interview.

The project was undertaken by two staff: Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk

Postdoctoral Research Assistant Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science
d
City of Tempe 2023 Business Survey Data
catalog.data.gov
s.cnmilf.com
+12more
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2023 Business Survey Data [Dataset]. https://catalog.data.gov/dataset/city-of-tempe-2023-business-survey-data
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
City of Tempe
Area covered
Tempe
Description
These data include the individual responses for the City of Tempe Annual Business Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Business Survey results are used as indicators for city performance measures. The performance measures with indicators from the Business Survey include the following (as of 2023):1. Financial Stability and Vitality5.01 Quality of Business ServicesThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.Additional InformationSource: Business SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData DictionaryMethods:The survey is mailed to a random sample of businesses in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used.To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city.Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.The data are used by the ETC Institute in the final published PDF report.
2 Branch Sales Data Set
kaggle.com
zip
Updated Jan 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Brueckmann (2022). 2 Branch Sales Data Set [Dataset]. https://www.kaggle.com/datasets/michaelbrueckmann/2-branch-sales-data-set
Explore at:
zip(11461615 bytes)Available download formats
Dataset updated
Jan 31, 2022
Authors
Michael Brueckmann
Description
Context

This data is random generated in Excel to practice forecasting and visualizations.

Content

The two branches utilize data of thousands of generated product data with nearly 200 different employees. Product ID numbers are randomly generated for each file

Acknowledgements

This project was for my practice

Inspiration
Data from: Fecal source identification using random forest
figshare.com
application/gzip
Updated Nov 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adelaide Roguet; A. Murat Eren; Ryan J Newton; Sandra L McLellan (2018). Fecal source identification using random forest [Dataset]. http://doi.org/10.6084/m9.figshare.6344888.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6344888.v2
Dataset updated
Nov 7, 2018
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Adelaide Roguet; A. Murat Eren; Ryan J Newton; Sandra L McLellan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Script_RF_V6.R: Code to perform the source identification using random forest classification (creation/training of the classifiers + prediction of new samples).RF_SourceIdentification_v6.RData: R-objects that contains the training/testing V6-databases used in the publication. ClostridialesBacteroiales_v6.fa: File containing the sequences of the V6- Clostridiales/Bacteroidales amplicon sequence variants.RF_training_outputs.zip: Folder containing the outputs of the random forest classifications performed to train the V6-classifiers.Confusion_Matrix_V6.xlsx: Excel spreadsheet containing the confusion matrix for each V6-classifiers.Comparison_V4V5_V6.zip (not peer-reviewed): Folder containing the files used to train/test/compare the classifiers.
m
Sediment sample physical properties and biofacies abundance data (Excel...
marine-geo.org
Updated Feb 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Sediment sample physical properties and biofacies abundance data (Excel format) for the CORE-VTRCC cruise (2021) [Dataset]. https://www.marine-geo.org/tools/files/31482
Explore at:
Dataset updated
Feb 4, 2024
Description
This data set presents sediment sample data consisting of biofacies abundance and sediment grain size and properties for samples collected in 2021 over the Vitoria Trindade Ridge during research cruise CORE-VTRCC on naval vessel Nho Cruzeiro do Sul. During the sampling program, nine seafloor sediment samples were collected within the Vitória-Trindade Ridge with a Van-Veen grab sampler (3600 cm²) on the top of the volcanic seamounts. Sample V7 collected only rhodoliths, without additional sediments. Grain sizes larger than 40 mm diameter (pebble size) were separated for rhodolith measurements. Sediment samples were washed to dissolve the salt concentration for 48 hours, then oven-dried at 45 °C for 72 hours. The rhodolith samples were dried at 35 °C for 48 hours. The morphometry of the rhodoliths was classified in spheirodal, discoidal and ellipsoidal based on the measurement of the long (L), intermediate (I) and short (S) axis with a Vernier Caliper. Samples were weighed and sieved in phi fractions (-1.5, -1.0, -0.5, 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0 and > 4.0) for 10 minutes. Samples sieved were described in phi fractions and were grouped in the following orders: granules (-2 to -1), very coarse sand (-1 to 0), coarse sand (0 to 1), medium sand (1 to 2), fine sand (2 to 3), very fine sand (3 to 4) and silt (> 4). The mean grain size and sorting were analyzed on Gradistat v9.1 software. Sediment samples were also placed on a Petri dish for microscope analysis and photography using Fiji software. Grains were then classified (Carbonate debris, foraminifers, bryozoans, sponge spicules, bivalves, gastropods, crustaceans, echinoderms, and annelida) and abundance was quantified based on 300 random point counts per sample. The data file is in Excel spreadsheet format. In the file names, SS = "Seafloor Samples". For the analysis, each seafloor sample was subdivided into ten subsamples (Q number). Codes: V# - Number of Sample (_# goes when there is multiple images for the same sample); Q# - Number of the quartile sample (only for the biofacies). Funding for this work was provided through FAPESP awards 2016/24946-9 and 2020/08847-6, and through the Brazilian Navy program PROAMAZONIA AZUL.
Data from: The Benefits of Body-Worn Cameras: New Findings from a Randomized...
catalog.data.gov
s.cnmilf.com
+2more
Updated Nov 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). The Benefits of Body-Worn Cameras: New Findings from a Randomized Controlled Trial at the Las Vegas Metropolitan Police Department, Nevada, 2014-2015 [Dataset]. https://catalog.data.gov/dataset/the-benefits-of-body-worn-cameras-new-findings-from-a-randomized-controlled-trial-at-2014--23e12
Explore at:
Dataset updated
Nov 14, 2025
Dataset provided by
National Institute of Justicehttp://nij.ojp.gov/
Area covered
Nevada
Description
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study reports the findings of a randomized controlled trial (RCT) involving more than 400 police officers and the use of body-worn cameras (BWC) in the Las Vegas Metropolitan Police Department (LVMPD). Officers were surveyed before and after the trial, and a random sample was interviewed to assess their level of comfort with technology, perceptions of self, civilians, other officers, and the use of BWCs. Information was gathered during ride-alongs with BWC officers and from a review of BWC videos. The collection includes 2 SPSS data files, 4 Excel data files, and 2 files containing aggregated treatment groups and rank-and-treatment groups, in Stata, Excel, and CSV format: SPSS: officer-survey---pretest.sav (n=422; 30 variables) SPSS: officer-survey---posttest2.sav (n=95; 33 variables) Excel: officer-interviews---form-a.xlsx (n=23; 52 variables) Excel: officer-interviews---form-b.xlsx (n=27; 52 variables) Excel: ride-along-observations.xlsx (n=72; 20 variables) Excel: video-review-data.xlsx (n=53; 21 variables) Stata: hours-and-compensation-rollup-to-treatment-group.dta (n=4; 42 variables) Excel: hours-and-compensation-rollup-to-treatment-group.xls (n=4; 42 variables) CSV: hours-and-compensation-rollup-to-treatment-group.csv (n=4; 42 variables) Stata: hours-and-compensation-rollup-to-rank-and-treatment-group.dta (n=12; 43 variables) Excel: hours-and-compensation-rollup-to-rank-and-treatment-group.xls (n=12; 43 variables) CSV: hours-and-compensation-rollup-to-rank-and-treatment-group.csv (n=12; 43 variables)
H
Relaxed Naïve Bayes Data
dataverse.harvard.edu
Updated Aug 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Relaxed Naïve Bayes Team (2023). Relaxed Naïve Bayes Data [Dataset]. http://doi.org/10.7910/DVN/7KNKLL
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/7KNKLL
Dataset updated
Aug 7, 2023
Dataset provided by
Harvard Dataverse
Authors
Relaxed Naïve Bayes Team
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
NaiveBayes_R.xlsx: This Excel file includes information as to how probabilities of observed features are calculated given recidivism (P(x_ij│R)) in the training data. Each cell is embedded with an Excel function to render appropriate figures. P(Xi|R): This tab contains probabilities of feature attributes among recidivated offenders. NIJ_Recoded: This tab contains re-coded NIJ recidivism challenge data following our coding schema described in Table 1. Recidivated_Train: This tab contains re-coded features of recidivated offenders. Tabs from [Gender] through [Condition_Other]: Each tab contains probabilities of feature attributes given recidivism. We use these conditional probabilities to replace the raw values of each feature in P(Xi|R) tab. NaiveBayes_NR.xlsx: This Excel file includes information as to how probabilities of observed features are calculated given non-recidivism (P(x_ij│N)) in the training data. Each cell is embedded with an Excel function to render appropriate figures. P(Xi|N): This tab contains probabilities of feature attributes among non-recidivated offenders. NIJ_Recoded: This tab contains re-coded NIJ recidivism challenge data following our coding schema described in Table 1. NonRecidivated_Train: This tab contains re-coded features of non-recidivated offenders. Tabs from [Gender] through [Condition_Other]: Each tab contains probabilities of feature attributes given non-recidivism. We use these conditional probabilities to replace the raw values of each feature in P(Xi|N) tab. Training_LnTransformed.xlsx: Figures in each cell are log-transformed ratios of probabilities in NaiveBayes_R.xlsx (P(Xi|R)) to the probabilities in NaiveBayes_NR.xlsx (P(Xi|N)). TestData.xlsx: This Excel file includes the following tabs based on the test data: P(Xi|R), P(Xi|N), NIJ_Recoded, and Test_LnTransformed (log-transformed P(Xi|R)/ P(Xi|N)). Training_LnTransformed.dta: We transform Training_LnTransformed.xlsx to Stata data set. We use Stat/Transfer 13 software package to transfer the file format. StataLog.smcl: This file includes the results of the logistic regression analysis. Both estimated intercept and coefficient estimates in this Stata log correspond to the raw weights and standardized weights in Figure 1. Brier Score_Re-Check.xlsx: This Excel file recalculates Brier scores of Relaxed Naïve Bayes Classifier in Table 3, showing evidence that results displayed in Table 3 are correct. *****Full List***** NaiveBayes_R.xlsx NaiveBayes_NR.xlsx Training_LnTransformed.xlsx TestData.xlsx Training_LnTransformed.dta StataLog.smcl Brier Score_Re-Check.xlsx Data for Weka (Training Set): Bayes_2022_NoID Data for Weka (Test Set): BayesTest_2022_NoID Weka output for machine learning models (Conventional naïve Bayes, AdaBoost, Multilayer Perceptron, Logistic Regression, and Random Forest)
Invoices Dataset
kaggle.com
zip
Updated Jan 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cankat Saraç (2022). Invoices Dataset [Dataset]. https://www.kaggle.com/datasets/cankatsrc/invoices
Explore at:
zip(574249 bytes)Available download formats
Dataset updated
Jan 18, 2022
Authors
Cankat Saraç
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
The invoice dataset provided is a mock dataset generated using the Python Faker library. It has been designed to mimic the format of data collected from an online store. The dataset contains various fields, including first name, last name, email, product ID, quantity, amount, invoice date, address, city, and stock code. All of the data in the dataset is randomly generated and does not represent actual individuals or products. The dataset can be used for various purposes, including testing algorithms or models related to invoice management, e-commerce, or customer behavior analysis. The data in this dataset can be used to identify trends, patterns, or anomalies in online shopping behavior, which can help businesses to optimize their online sales strategies.
d
Data from: Random interbreeding between cryptic lineages of the Common...
datadryad.org
search.dataone.org
+1more
zip
Updated Mar 9, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William C Webb; John M Marzluff; Kevin E Omland (2011). Random interbreeding between cryptic lineages of the Common Raven: evidence for speciation in reverse [Dataset]. http://doi.org/10.5061/dryad.8774
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8774
Dataset updated
Mar 9, 2011
Dataset provided by
Dryad
Authors
William C Webb; John M Marzluff; Kevin E Omland
Time period covered
Mar 9, 2011
Area covered
Olympic Peninsula, Washington State
Description
Dryad_MatedPair_DataThis spreadsheet (MS Excel format) contains data related to raven mate pairing behavior with respect to their mtDNA haplotypes. See the associated ReadMe file for addition details.
Massive Bank dataset ( 1 Million+ rows)
kaggle.com
zip
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
K S ABISHEK (2023). Massive Bank dataset ( 1 Million+ rows) [Dataset]. https://www.kaggle.com/datasets/ksabishek/massive-bank-dataset-1-million-rows
Explore at:
zip(32471013 bytes)Available download formats
Dataset updated
Feb 21, 2023
Authors
K S ABISHEK
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Greetings , fellow analysts !

(NOTE : This is a random dataset generated using python. It bears no resemblance to any real entity in the corporate world. Any resemblance is a matter of coincidence.)

REC-SSEC Bank is a govt-aided bank operating in the Indian Peninsula. They have regional branches in over 40+ regions of the country. You have been provided with a massive excel sheet containing the transaction details, the total transaction amount and their location and total transaction count.

The dataset is described as follows :

Date - The date on which the transaction took place. 2.Domain - Where or which type of Business entity made the transaction. 3.Location - Where the data is collected from 4.Value - Total value of transaction

Count of transaction .

For example , in the very first row , the data can be read as : " On the first of January, 2022 , 1932 transactions of summing upto INR 365554 from Bhuj were reported " NOTE : There are about 2750 transactions every single day. All of this has been given to you.

The bank wants you to answer the following questions :

What is the average transaction value everyday for each domain over the year.

What is the average transaction value for every city/location over the year

The bank CEO , Mr: Hariharan , wants to promote the ease of transaction for the highest active domain. If the domains could be sorted into a priority, what would be the priority list ?

What's the average transaction count for each city ?
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.