100+ datasets found

m
Raw data outputs 1-18
bridges.monash.edu
researchdata.edu.au
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.26180/21259491.v1
Dataset updated
May 30, 2023
Dataset provided by
Monash University
Authors
Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
HR Analytics Dataset
kaggle.com
zip
Updated Oct 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anshika2301 (2023). HR Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/anshika2301/hr-analytics-dataset
Explore at:
zip(213690 bytes)Available download formats
Dataset updated
Oct 27, 2023
Authors
anshika2301
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
HR analytics, also referred to as people analytics, workforce analytics, or talent analytics, involves gathering together, analyzing, and reporting HR data. It is the collection and application of talent data to improve critical talent and business outcomes. It enables your organization to measure the impact of a range of HR metrics on overall business performance and make decisions based on data. They are primarily responsible for interpreting and analyzing vast datasets.

Download the data CSV files here ; https://drive.google.com/drive/folders/18mQalCEyZypeV8TJeP3SME_R6qsCS2Og
Data analysis project
kaggle.com
zip
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SzymonBiaas (2024). Data analysis project [Dataset]. https://www.kaggle.com/datasets/szymonbiaas/data-analysis-project
Explore at:
zip(116043378 bytes)Available download formats
Dataset updated
Aug 15, 2024
Authors
SzymonBiaas
Description
This dashboard was created from data published by Olist Store (a Brazilian e-commerce public dataset). Raw data contains information about 100 000 orders from 2016 to 2018 placed in many regions of Brazil.

The raw datasets were imported into Excel using “Get data” option (formerly known as “Power Query”) and cleaned. An additional table with the names of Brazilian states was also imported from the Wikipedia page.

A Data Table about payment information was created based on imported statistics with the usage of nested formulas. Then, proper pivot charts were used to build an Olist Store Payment Dashboard which allows you to review the data using a connected timeline and slicers.
Raw data from datasets used in SIMON analysis
data.europa.eu
unknown
Updated Jan 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2022). Raw data from datasets used in SIMON analysis [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-2580414?locale=hr
Explore at:
unknown(312591)Available download formats
Dataset updated
Jan 27, 2022
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here you can find raw data and information about each of the 34 datasets generated by the mulset algorithm and used for further analysis in SIMON. Each dataset is stored in separate folder which contains 4 files: json_info: This file contains, number of features with their names and number of subjects that are available for the same dataset data_testing: data frame with data used to test trained model data_training: data frame with data used to train models results: direct unfiltered data from database Files are written in feather format. Here is an example of data structure for each file in repository. File was compressed using 7-Zip available at https://www.7-zip.org/.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Supply Chain DataSet
kaggle.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amir Motefaker (2023). Supply Chain DataSet [Dataset]. https://www.kaggle.com/datasets/amirmotefaker/supply-chain-dataset
Explore at:
zip(9340 bytes)Available download formats
Dataset updated
Jun 1, 2023
Authors
Amir Motefaker
Description
Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
i
Malware Analysis Datasets: Raw PE as Image
ieee-dataport.org
Updated Nov 7, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Oliveira (2019). Malware Analysis Datasets: Raw PE as Image [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-raw-pe-image
Explore at:
Dataset updated
Nov 7, 2019
Authors
Angelo Oliveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Raw PE byte stream rescaled to a 32 x 32 greyscale image using the Nearest Neighbor Interpolation algorithm and then flattened to a 1024 bytes vector. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.
The raw datasets and statistics for each analysis result graph.
plos.figshare.com
zip
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barry Smyth (2023). The raw datasets and statistics for each analysis result graph. [Dataset]. http://doi.org/10.1371/journal.pone.0251513.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0251513.s002
Dataset updated
Jun 11, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Barry Smyth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Each individual result graph is associated with 4 different comma-separated files: (i) Raw—the (anonymised) raw data behind the means and standard deviations used for a particular result graph; (ii) Paired—the paired statistical significance results; (iii) Successive Male—the statistical significance results to compare successive groups (age and ability) for male runners; and (iv) Successive Female—the corresponding results for the statistical significance tests to compare successive groups (age and ability) of female runners. (ZIP)
c
Target products dataset
crawlfeeds.com
csv, zip
Updated Sep 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2024). Target products dataset [Dataset]. https://crawlfeeds.com/datasets/target-products-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Sep 10, 2024
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Target Products Dataset is a robust collection in CSV format, featuring 1.3 million product records sourced from Target's online platform. This dataset contains rich details on a wide range of products, including product titles, URLs, pricing, availability, and more. It is an ideal resource for businesses, researchers, and data scientists interested in analyzing retail trends, product availability, and pricing strategies.

Key Data Fields:

Title: Name of the product.

URL: Direct link to the product page.

Brand: The brand associated with the product.

Main Image: URL of the main product image.

SKU: Unique Stock Keeping Unit identifier.

Description: A structured product description.

Raw Description: The original product description before any processing.

GTIN13: Global Trade Item Number (GTIN) in 13-digit format.

Currency: The currency in which the product is priced.

Price: Price of the product.

Availability: Availability status of the product (e.g., in stock, out of stock).

Available Delivery Method: Methods through which the product can be delivered.

Available Branch: Information on availability at specific store locations.

Primary Category: The main category to which the product belongs.

Sub Category 1, 2, 3: Further sub-categorization of the product.

Images: URLs to additional product images.

Raw Specifications: Unprocessed specifications of the product.

Specifications: Structured product specifications.

Highlights: Key highlights and features of the product.

Raw Highlights: Unstructured highlights before processing.

Uniq ID: A unique identifier for each product.

Scraped At: The timestamp indicating when the data was collected.

Use Cases:

Retail Analytics: Analyze pricing trends, brand popularity, and product availability across categories.

Product Categorization: Study the classification of products into primary and sub-categories.

E-commerce Analysis: Use this dataset for consumer behavior studies, inventory management, or competitive analysis.

Recommendation Systems: Build product recommendation engines using product features, pricing, and availability data.
Treevill: N.B.G. Unique & Rare Raw Dataset
kaggle.com
zip
Updated Jan 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuvo Kumar Basak-4004 (2025). Treevill: N.B.G. Unique & Rare Raw Dataset [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/treevill-n-b-g-unique-and-rare-raw-dataset
Explore at:
zip(2423469043 bytes)Available download formats
Dataset updated
Jan 26, 2025
Authors
Shuvo Kumar Basak-4004
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Treevill: N.B.G. Unique & Rare Raw Dataset

This dataset, named Treevill: N.B.G. Unique & Rare Raw Dataset, is a collection of images sourced from the National Botanical Garden of Bangladesh (N.B.G.), showcasing a variety of unique and rare tree species. The dataset contains a total of 66 folders, each representing a specific tree species. For each species, approximately 2000 images are included, all resized to 256x256 pixels in JPEG format.

The dataset is intended for research, educational, and machine learning purposes, particularly in the fields of image classification, object recognition, and biodiversity studies. The high number of images per tree species ensures diversity in terms of tree angles, lighting, and conditions, which can be crucial for training machine learning models for species identification.

Procedure for Data Collection and Organization:

Data Collection: Images were collected from the National Botanical Garden of Bangladesh. Each tree species was carefully documented, ensuring that images captured a variety of perspectives and conditions for each species. Image Resizing: All images were resized to a standard resolution of 256x256 pixels for consistency in the dataset. Format Standardization: All images were converted into JPEG format, ensuring uniformity and ease of use in various applications. Folder Organization: Each species was assigned a unique folder in the dataset. These folders are named after the species they represent and contain approximately 2000 images each. Final Dataset: The final dataset consists of 66 folders, each dedicated to a specific tree species, making it easier to access and analyze the data for various tree-related research purposes.

List of Folders (Tree Species):

Akashmoni Aloe Wood Ashok Ashore Australian Pine Avocado Bahera Bamboo Banana Baro bottle brush Bazna Belati gab Bishop wood Blue Bellvine Buddha Coconut Camphor Tree Cannonball Tree Carambola Champaca Chaplash Civit Corkwood Crown Gardenia Debdaeu Devil Tree Dvils Cotton East Indian copaiba balsam Egyptian lotus Golden Shower Tree Guava Hairy Sterculia Haldu Haritaki Heaven Lotus Hijol Holudkrishnachura India Red Pear Jack Fruit Jiga Kamala Tree Kanjal Karanja Karen Wood Khejur Koinar Loha kat Mahogany Makri-shal Mango Marking Nut tree Mastwood Mexican lilac Mouskanda Nageshore Palm Piliostigma Prickly Tree Raktan Roskau Sada Golachi Shail Vadi Sisso Soap Nut Tree Teak The Poonspar Tree Udaya padda

Source: National Botanical Garden, Zoo Road, Dhaka, Bangladesh

Related links:

Shuvo, Shuvo Kumar Basak (2025), “Treevill: National Botanical Garden Unique & Rare Tree Argument Dataset ”, Mendeley Data, V1, doi: 10.17632/t7rwzgbfdd.1

https://doi.org/10.34740/KAGGLE/DSV/10582625

https://doi.org/10.34740/KAGGLE/DSV/10579609

https://doi.org/10.34740/KAGGLE/DSV/10579122

Treevill: N.B.G. Unique & Rare Raw Dataset - Access, Collaboration, and Paid Services Policy

I, Shuvo Kumar Basak, have created and curated the Treevill: N.B.G. Unique & Rare Raw Dataset, which consists of images of unique and rare tree species collected from the National Botanical Garden of Bangladesh. This dataset is freely available for research, educational, and non-commercial purposes.

Free Access to the Dataset: The Treevill: N.B.G. Unique & Rare Raw Dataset is available free of charge to all individuals and organizations for educational and research use. This is to support the advancement of knowledge and studies related to biodiversity, machine learning, and related fields.

Future Collaboration and Data Requests: While the dataset is provided free of charge, I encourage individuals and organizations to contact me directly if they need access to additional related data, further assistance, or if they plan on expanding their research in the future.

If you require any new data or specific related datasets, feel free to reach out to me, Shuvo Kumar Basak, for collaboration. I am happy to assist with additional data collection, cleaning, resizing, or other related services at a reasonable cost.

Paid Services - Hire for Data Collection: If you or your organization need custom data collection or wish to obtain related datasets beyond what is included in this collection, I offer a paid service to gather new data according to your specific requirements. This includes: Custom data collection for other tree species or related botanical data.

Data cleaning, resizing, and preprocessing to make the data ready for analysis.

Please contact me for a custom quote based on your specific needs. I will work with you to provide high-quality, tailored datasets to support your research, project, or business needs. Terms and Conditions: The dataset is intended for academic, research, and non-commercial purposes only. Redistribution or commercial use of the dataset without prior written co...

Retail Store Sales: Dirty for Data Cleaning

kaggle.com

zip

Updated Jan 18, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ahmed Mohamed (2025). Retail Store Sales: Dirty for Data Cleaning [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/retail-store-sales-dirty-for-data-cleaning

Explore at:

zip(226740 bytes)Available download formats

Dataset updated

Jan 18, 2025

Authors

Ahmed Mohamed

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dirty Retail Store Sales Dataset

Overview

The Dirty Retail Store Sales dataset contains 12,575 rows of synthetic data representing sales transactions from a retail store. The dataset includes eight product categories with 25 items per category, each having static prices. It is designed to simulate real-world sales data, including intentional "dirtiness" such as missing or inconsistent values. This dataset is suitable for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.

File Information

File Name: retail_store_sales.csv
Number of Rows: 12,575
Number of Columns: 11

Columns Description

Column Name	Description	Example Values
`Transaction ID`	A unique identifier for each transaction. Always present and unique.	`TXN_1234567`
`Customer ID`	A unique identifier for each customer. 25 unique customers.	`CUST_01`
`Category`	The category of the purchased item.	`Food`, `Furniture`
`Item`	The name of the purchased item. May contain missing values or `None`.	`Item_1_FOOD`, `None`
`Price Per Unit`	The static price of a single unit of the item. May contain missing or `None` values.	`4.00`, `None`
`Quantity`	The quantity of the item purchased. May contain missing or `None` values.	`1`, `None`
`Total Spent`	The total amount spent on the transaction. Calculated as `Quantity * Price Per Unit`.	`8.00`, `None`
`Payment Method`	The method of payment used. May contain missing or invalid values.	`Cash`, `Credit Card`
`Location`	The location where the transaction occurred. May contain missing or invalid values.	`In-store`, `Online`
`Transaction Date`	The date of the transaction. Always present and valid.	`2023-01-15`
`Discount Applied`	Indicates if a discount was applied to the transaction. May contain missing values.	`True`, `False`, `None`

Categories and Items

The dataset includes the following categories, each containing 25 items with corresponding codes, names, and static prices:

Electric Household Essentials

Item Code	Item Name	Price
Item_1_EHE	Blender	5.0
Item_2_EHE	Microwave	6.5
Item_3_EHE	Toaster	8.0
Item_4_EHE	Vacuum Cleaner	9.5
Item_5_EHE	Air Purifier	11.0
Item_6_EHE	Electric Kettle	12.5
Item_7_EHE	Rice Cooker	14.0
Item_8_EHE	Iron	15.5
Item_9_EHE	Ceiling Fan	17.0
Item_10_EHE	Table Fan	18.5
Item_11_EHE	Hair Dryer	20.0
Item_12_EHE	Heater	21.5
Item_13_EHE	Humidifier	23.0
Item_14_EHE	Dehumidifier	24.5
Item_15_EHE	Coffee Maker	26.0
Item_16_EHE	Portable AC	27.5
Item_17_EHE	Electric Stove	29.0
Item_18_EHE	Pressure Cooker	30.5
Item_19_EHE	Induction Cooktop	32.0
Item_20_EHE	Water Dispenser	33.5
Item_21_EHE	Hand Blender	35.0
Item_22_EHE	Mixer Grinder	36.5
Item_23_EHE	Sandwich Maker	38.0
Item_24_EHE	Air Fryer	39.5
Item_25_EHE	Juicer	41.0

Furniture

Item Code	Item Name	Price
Item_1_FUR	Office Chair	5.0
Item_2_FUR	Sofa	6.5
Item_3_FUR	Coffee Table	8.0
Item_4_FUR	Dining Table	9.5
Item_5_FUR	Bookshelf	11.0
Item_6_FUR	Bed F...

Z
Quantitative raw data for "Large scale regional citizen surveys report"...
data.niaid.nih.gov
zenodo.org
+1more
Updated Feb 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Panori, Anastasia; Bakratsas, Thomas; Chapizanis, Dimitrios; Altsitsiadis, Efthymios; Hauschildt, Christian (2022). Quantitative raw data for "Large scale regional citizen surveys report" (D1.4) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5958017
Explore at:
Dataset updated
Feb 3, 2022
Dataset provided by
White Research SRL
Authors
Panori, Anastasia; Bakratsas, Thomas; Chapizanis, Dimitrios; Altsitsiadis, Efthymios; Hauschildt, Christian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset presents the quantitative raw data that was collected under the H2020 RRI2SCALE project for the D1.4 - “Large scale regional citizen surveys report”. The dataset includes the answers that were provided by almost 8,000 participants from 4 pilot European regions (Kriti, Vestland, Galicia, and Overijssel) regarding the general public's views, concerns, and moral issues about the current and future trajectories of their RTD&I ecosystem. The original survey questionnaire was created by White Research SRL and disseminated to the regions through supporting pilot partners. Data collection took place from June 2020 to September 2020 through 4 different waves – one for each region. Based on the conclusion of a consortium vote during the kick-off meeting, it was decided that instead of resource-intensive methods that would render data collection unduly expensive, to fill in the quotas responses were collected through online panels by survey companies that were used for each region. For the statistical analysis of the data and the conclusions drawn from the analysis, you can access the "Large scale regional citizen surveys report" (D1.4).
h
dataset-tsql-data-analysis
huggingface.co
Updated Jan 1, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Meldrum (2020). dataset-tsql-data-analysis [Dataset]. https://huggingface.co/datasets/dmeldrum6/dataset-tsql-data-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 1, 2020
Authors
David Meldrum
Description
Dataset Card for dataset-tsql-data-analysis

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dmeldrum6/dataset-tsql-data-analysis/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dmeldrum6/dataset-tsql-data-analysis.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Data from: Does the Disclosure of Gun Ownership Affect Crime? Evidence from...
search.datacite.org
openicpsr.org
+1more
Updated 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Tannenbaum (2018). Does the Disclosure of Gun Ownership Affect Crime? Evidence from New York [Dataset]. http://doi.org/10.3886/e109802v1
Explore at:
Unique identifier
https://doi.org/10.3886/e109802v1
Dataset updated
2018
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
DataCitehttps://www.datacite.org/
Authors
Daniel Tannenbaum
Description
This repository contains the data and code necessary to replicate all figures and tables in the working paper: "Does the disclosure of gun ownership affect crime? Evidence from New York" by Daniel Tannenbaum
There are four folders in this repository:(1) Build: contains all the .do files required to produce the analysis datasets, using the raw data (i.e. datasets in the RawData folder).(2) Analysis: contains all the .do files required to produce all the figures and tables in the paper, using the analysis datasets (i.e. datasets in the AnalysisData folder).(3) RawData: contains all the raw datasets used to produce the AnalysisData datasets. The only raw dataset used in the paper that is excluded from this folder is the proprietary housing assessor and sales transaction data from DataQuick, owned by Corelogic. If I receive approval to include this raw data in this repository I will do so in future versions of this repository.(4) AnalysisData: contains all the analysis datasets that are created using the Build and are used to produce the tables and figures in the paper.

Running the file Master_analysis.do in the Analysis folder will produce, in one script, all the tables and figures in the paper.
m
Global Burden of Disease analysis dataset of noncommunicable disease...
data.mendeley.com
Updated Apr 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Cundiff (2023). Global Burden of Disease analysis dataset of noncommunicable disease outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.10
Explore at:
Unique identifier
https://doi.org/10.17632/g6b39zxck4.10
Dataset updated
Apr 6, 2023
Authors
David Cundiff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.

The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.

These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis. The data include the following: 1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc). 2. A text file to import the analysis database into SAS 3. The SAS code to format the analysis database to be used for analytics 4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6 5. SAS code for deriving the multiple regression formula in Table 4. 6. SAS code for deriving the multiple regression formula in Table 5 7. SAS code for deriving the multiple regression formula in Supplementary Table 7
8. SAS code for deriving the multiple regression formula in Supplementary Table 8 9. The Excel files that accompanied the above SAS code to produce the tables

For questions, please email davidkcundiff@gmail.com. Thanks.
d
PyTorch geometric datasets for morphVQ models
datadryad.org
data.niaid.nih.gov
zip
Updated Sep 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson (2022). PyTorch geometric datasets for morphVQ models [Dataset]. http://doi.org/10.5061/dryad.bvq83bkcr
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.bvq83bkcr
Dataset updated
Sep 29, 2022
Dataset provided by
Dryad
Authors
Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson
Time period covered
Sep 2, 2022
Description
These datasets are customized Torch Geometric Datasets that contain raw .off polygon meshes as well as preprocessed .pt files needed for training morphVQ models. morphVQ can be found at https://github.com/oothomas/morphVQ.
f
Raw dataset for 90 tree individuals used for the statistical analysis.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donoso, David A.; Limberger, Oliver; Tiede, Yvonne; Bendix, Jörg; Schön, Jana E.; Homeier, Jürgen; Farwig, Nina; Becker, Marcel; Brandl, Roland (2023). Raw dataset for 90 tree individuals used for the statistical analysis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000947352
Explore at:
Dataset updated
Nov 7, 2023
Authors
Donoso, David A.; Limberger, Oliver; Tiede, Yvonne; Bendix, Jörg; Schön, Jana E.; Homeier, Jürgen; Farwig, Nina; Becker, Marcel; Brandl, Roland
Description
Raw dataset for 90 tree individuals used for the statistical analysis.
Analysis of ITT-LOCF, LOCF, and completers for handling missing data in the...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mai A. Elobeid; Miguel A. Padilla; Theresa McVie; Olivia Thomas; David W. Brock; Bret Musser; Kaifeng Lu; Christopher S. Coffey; Renee A. Desmond; Marie-Pierre St-Onge; Kishore M. Gadde; Steven B. Heymsfield; David B. Allison (2023). Analysis of ITT-LOCF, LOCF, and completers for handling missing data in the 12 raw datasets using Multiple Imputation. [Dataset]. http://doi.org/10.1371/journal.pone.0006624.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0006624.t004
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mai A. Elobeid; Miguel A. Padilla; Theresa McVie; Olivia Thomas; David W. Brock; Bret Musser; Kaifeng Lu; Christopher S. Coffey; Renee A. Desmond; Marie-Pierre St-Onge; Kishore M. Gadde; Steven B. Heymsfield; David B. Allison
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abbreviations: RCT, randomized controlled trial; MI, multiple imputation. Each permutation test is based on 10,000 permutations of each dataset.
f
Analysis of ITT-LOCF, LOCF, and completers for handling missing data in the...
plos.figshare.com
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mai A. Elobeid; Miguel A. Padilla; Theresa McVie; Olivia Thomas; David W. Brock; Bret Musser; Kaifeng Lu; Christopher S. Coffey; Renee A. Desmond; Marie-Pierre St-Onge; Kishore M. Gadde; Steven B. Heymsfield; David B. Allison (2023). Analysis of ITT-LOCF, LOCF, and completers for handling missing data in the 12 raw datasets using ordinary least squares. [Dataset]. http://doi.org/10.1371/journal.pone.0006624.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0006624.t003
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Mai A. Elobeid; Miguel A. Padilla; Theresa McVie; Olivia Thomas; David W. Brock; Bret Musser; Kaifeng Lu; Christopher S. Coffey; Renee A. Desmond; Marie-Pierre St-Onge; Kishore M. Gadde; Steven B. Heymsfield; David B. Allison
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abbreviations: ITT-LOCF, intent-to-treat-last observation carried forward; RCT, randomized controlled trial.aIndicates missing data pattern is the same for ITT-LOCF and LOCF. Each permutation test is based on 10,000 permutations of each dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1

Raw data outputs 1-18

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.26180/21259491.v1

Dataset updated

May 30, 2023

Dataset provided by

Monash University

Authors

Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.

Clear search

Close search

Google apps

Main menu

Raw data outputs 1-18

HR Analytics Dataset

Data analysis project

Raw data from datasets used in SIMON analysis

Datasets for Sentiment Analysis

Supply Chain DataSet

Malware Analysis Datasets: Raw PE as Image

The raw datasets and statistics for each analysis result graph.

Target products dataset

Use Cases:

Treevill: N.B.G. Unique & Rare Raw Dataset

Retail Store Sales: Dirty for Data Cleaning

Dirty Retail Store Sales Dataset

Overview

File Information

Columns Description

Categories and Items

Electric Household Essentials

Furniture

Quantitative raw data for "Large scale regional citizen surveys report"...

dataset-tsql-data-analysis

UC_vs_US Statistic Analysis.xlsx

Data from: Does the Disclosure of Gun Ownership Affect Crime? Evidence from...

Global Burden of Disease analysis dataset of noncommunicable disease...

PyTorch geometric datasets for morphVQ models

Raw dataset for 90 tree individuals used for the statistical analysis.

Analysis of ITT-LOCF, LOCF, and completers for handling missing data in the...

Analysis of ITT-LOCF, LOCF, and completers for handling missing data in the...

Raw data outputs 1-18