100+ datasets found

Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
[Data Analysis/2021] Mix Dataset
kaggle.com
zip
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pham Le Tu Nhi (2024). [Data Analysis/2021] Mix Dataset [Dataset]. https://www.kaggle.com/datasets/phamletunhi/data-analysis2021-mix-dataset
Explore at:
zip(1086824007 bytes)Available download formats
Dataset updated
Apr 16, 2024
Authors
Pham Le Tu Nhi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Pham Le Tu Nhi

Released under MIT

Contents
c
Walmart Products Dataset – Free Product Data CSV
crawlfeeds.com
csv, zip
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Walmart Products Dataset – Free Product Data CSV [Dataset]. https://crawlfeeds.com/datasets/walmart-products-free-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.

Key Features

Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.

CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.

Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.

Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.

Who Benefits?

Data analysts & researchers exploring e-commerce trends or product catalog data.

Developers & data scientists building price-comparison tools, recommendation engines or ML models.

E-commerce strategists/marketers need product metadata for competitive analysis or market research.

Students/hobbyists needing a free dataset for learning or demo projects.

Why Use This Dataset Instead of Manual Scraping?

Time-saving: No need to write scrapers or deal with rate limits.

Clean, structured data: All records are verified and already formatted in CSV, saving hours of cleaning.

Risk-free: Avoid Terms-of-Service issues or IP blocks that come with manual scraping.
Instant access: Free and immediately downloadable.
Comprehensive Supply Chain Analysis
kaggle.com
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dorothy Joel (2023). Comprehensive Supply Chain Analysis [Dataset]. https://www.kaggle.com/datasets/dorothyjoel/us-regional-sales
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dorothy Joel
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This supply chain analysis provides a comprehensive view of the company's order and distribution processes, allowing for in-depth analysis and optimization of various aspects of the supply chain, from procurement and inventory management to sales and customer satisfaction. It empowers the company to make data-driven decisions to improve efficiency, reduce costs, and enhance customer experiences. The provided supply chain analysis dataset contains various columns that capture important information related to the company's order and distribution processes:

• OrderNumber • Sales Channel • WarehouseCode • ProcuredDate • CurrencyCode • OrderDate • ShipDate • DeliveryDate • SalesTeamID • CustomerID • StoreID • ProductID • Order Quantity • Discount Applied • Unit Cost • Unit Price
Google Analytics data of an E-commerce Company
kaggle.com
zip
Updated Oct 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fehu.zone (2024). Google Analytics data of an E-commerce Company [Dataset]. https://www.kaggle.com/datasets/fehu94/google-analytics-data-of-an-e-commerce-company
Explore at:
zip(3156 bytes)Available download formats
Dataset updated
Oct 19, 2024
Authors
fehu.zone
Description
📊 Dataset Title: Daily Active Users Dataset

📝 Description

This dataset provides detailed insights into daily active users (DAU) of a platform or service, captured over a defined period of time. The dataset includes information such as the number of active users per day, allowing data analysts and business intelligence teams to track usage trends, monitor platform engagement, and identify patterns in user activity over time.

The data is ideal for performing time series analysis, statistical analysis, and trend forecasting. You can utilize this dataset to measure the success of platform initiatives, evaluate user behavior, or predict future trends in engagement. It is also suitable for training machine learning models that focus on user activity prediction or anomaly detection.

📂 Dataset Structure

The dataset is structured in a simple and easy-to-use format, containing the following columns:

Date: The date on which the data was recorded, formatted as YYYYMMDD.

Number of Active Users: The number of users who were active on the platform on the corresponding date.

Each row in the dataset represents a unique date and its corresponding number of active users. This allows for time-based analysis, such as calculating the moving average of active users, detecting seasonality, or spotting sudden spikes or drops in engagement.

🧐 Key Use Cases

This dataset can be used for a wide range of purposes, including:

Time Series Analysis: Analyze trends and seasonality of user engagement.

Trend Detection: Discover peaks and valleys in user activity.

Anomaly Detection: Use statistical methods or machine learning algorithms to detect anomalies in user behavior.

Forecasting User Growth: Build forecasting models to predict future platform usage.

Seasonality Insights: Identify patterns like increased activity on weekends or holidays.

📈 Potential Analysis

Here are some specific analyses you can perform using this dataset:

Moving Average and Smoothing: Calculate the moving average over a 7-day or 30-day period.

Correlation with External Factors: Correlate daily active users with other datasets.

Statistical Hypothesis Testing: Perform t-tests or ANOVA to determine significant differences in user activity.

Machine Learning for Prediction: Train machine learning models to predict user engagement.

🚀 Getting Started

To get started with this dataset, you can load it into your preferred analysis tool. Here's how to do it using Python's pandas library:

import pandas as pd # Load the dataset data = pd.read_csv('path_to_dataset.csv') # Display the first few rows print(data.head()) # Basic statistics print(data.describe())
Data from: Social Media Data Analysis
kaggle.com
zip
Updated Apr 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafe Muhtasim (2021). Social Media Data Analysis [Dataset]. https://www.kaggle.com/datasets/nafemuhtasim/social-media-data-analysis
Explore at:
zip(29081 bytes)Available download formats
Dataset updated
Apr 16, 2021
Authors
Nafe Muhtasim
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Nafe Muhtasim

Released under CC0: Public Domain

Contents
m
Student Skill Gap Analysis
data.mendeley.com
kaggle.com
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bindu Garg (2025). Student Skill Gap Analysis [Dataset]. http://doi.org/10.17632/rv6scbpd7v.1
Explore at:
Unique identifier
https://doi.org/10.17632/rv6scbpd7v.1
Dataset updated
Apr 28, 2025
Authors
Bindu Garg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is designed for skill gap analysis, focusing on evaluating the skill gap between students’ current skills and industry requirements. It provides insights into technical skills, soft skills, career interests, and challenges, helping in skill gap analysis to identify areas for improvement.

By leveraging this dataset, educators, recruiters, and researchers can conduct skill gap analysis to assess students’ job readiness and tailor training programs accordingly. It serves as a valuable resource for identifying skill deficiencies and skill gaps improving career guidance, and enhancing curriculum design through targeted skill gap analysis.

Following is the column descriptors: Name - Student's full name. email_id - Student's email address. Year - The academic year the student is currently in (e.g., 1st Year, 2nd Year, etc.). Current Course - The course the student is currently pursuing (e.g., B.Tech CSE, MBA, etc.). Technical Skills - List of technical skills possessed by the student (e.g., Python, Data Analysis, Cloud Computing). Programming Languages - Programming languages known by the student (e.g., Python, Java, C++). Rating - Self-assessed rating of technical skills on a scale of 1 to 5. Soft Skills - List of soft skills (e.g., Communication, Leadership, Teamwork). Rating - Self-assessed rating of soft skills on a scale of 1 to 5. Projects - Indicates whether the student has worked on any projects (Yes/No). Career Interest - The student's preferred career path (e.g., Data Scientist, Software Engineer). Challenges - Challenges faced while applying for jobs/internships (e.g., Lack of experience, Resume building issues).
f
Descriptive statistics.
plos.figshare.com
xls
Updated Oct 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha (2023). Descriptive statistics. [Dataset]. http://doi.org/10.1371/journal.pgph.0002475.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgph.0002475.t003
Dataset updated
Oct 31, 2023
Dataset provided by
PLOS Global Public Health
Authors
Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
g
Insurance Dataset
gts.ai
json
Updated Oct 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2022). Insurance Dataset [Dataset]. https://gts.ai/case-study/insurance-dataset-annotation-services-for-precision-data-analysis/
Explore at:
jsonAvailable download formats
Dataset updated
Oct 16, 2022
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Insurance Dataset project is an extensive initiative focused on collecting and analyzing insurance-related data from various sources.
H
Python Codes for Data Analysis of The Impact of COVID-19 on Technical...
dataverse.harvard.edu
figshare.com
Updated Mar 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Szkirpan (2022). Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results [Dataset]. http://doi.org/10.7910/DVN/SXMSDZ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/SXMSDZ
Dataset updated
Mar 21, 2022
Dataset provided by
Harvard Dataverse
Authors
Elizabeth Szkirpan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).
m
Dataset
data.mendeley.com
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gunarto Wibisono (2025). Dataset [Dataset]. http://doi.org/10.17632/298hjrz9cm.1
Explore at:
Unique identifier
https://doi.org/10.17632/298hjrz9cm.1
Dataset updated
Jun 10, 2025
Authors
Gunarto Wibisono
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains supporting data collected during the research titled "Design of a Modular and Scalable IoT-Based System for Utility Management: A Case Study at 3 Towers Residences."

The data includes: - Field observation photos of utility control rooms (electrical only), to show the electrical panel room is very tight and the MCB meter box as well. - Screenshots of email correspondence from finance team (masked the sender), to show inconsistency of delivery dates. - Visual evidence of Excel utility billing data which manually processed. - Supplementary documentation used in the thesis development.
Exploring E-commerce Trends⭐️⭐️⭐️
kaggle.com
zip
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Roshan Riaz (2024). Exploring E-commerce Trends⭐️⭐️⭐️ [Dataset]. https://www.kaggle.com/datasets/muhammadroshaanriaz/e-commerce-trends-a-guide-to-leveraging-dataset
Explore at:
zip(51169 bytes)Available download formats
Dataset updated
Jul 8, 2024
Authors
Muhammad Roshan Riaz
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Exploring E-commerce Trends: A Guide to Leveraging Dummy Dataset

Introduction: In the world of e-commerce, data is a powerful asset that can be leveraged to understand customer behavior, improve sales strategies, and enhance overall business performance. This guide explores how to effectively utilize a dummy dataset generated to simulate various aspects of an e-commerce platform. By analyzing this dataset, businesses can gain valuable insights into product trends, customer preferences, and market dynamics.

Dataset Overview: The dummy dataset contains information on 1000 products across different categories such as electronics, clothing, home & kitchen, books, toys & games, and more. Each product is associated with attributes such as price, rating, number of reviews, stock quantity, discounts, sales, and date added to inventory. This comprehensive dataset provides a rich source of information for analysis and exploration.

Data Analysis: Using tools like Pandas, NumPy, and visualization libraries like Matplotlib or Seaborn, businesses can perform in-depth analysis of the dataset. Key insights such as top-selling products, popular product categories, pricing trends, and seasonal variations can be extracted through exploratory data analysis (EDA). Visualization techniques can be employed to create intuitive graphs and charts for better understanding and communication of findings.

Machine Learning Applications: The dataset can be used to train machine learning models for various e-commerce tasks such as product recommendation, sales prediction, customer segmentation, and sentiment analysis. By applying algorithms like linear regression, decision trees, or neural networks, businesses can develop predictive models to optimize inventory management, personalize customer experiences, and drive sales growth.

Testing and Prototyping: Businesses can utilize the dummy dataset to test new algorithms, prototype new features, or conduct A/B testing experiments without impacting real user data. This enables rapid iteration and experimentation to validate hypotheses and refine strategies before implementation in a live environment.

Educational Resources: The dummy dataset serves as an invaluable educational resource for students, researchers, and professionals interested in learning about e-commerce data analysis and machine learning. Tutorials, workshops, and online courses can be developed using the dataset to teach concepts such as data manipulation, statistical analysis, and model training in the context of e-commerce.

Decision Support and Strategy Development: Insights derived from the dataset can inform strategic decision-making processes and guide business strategy development. By understanding customer preferences, market trends, and competitor behavior, businesses can make informed decisions regarding product assortment, pricing strategies, marketing campaigns, and resource allocation.

Conclusion: In conclusion, the dummy dataset provides a versatile and valuable resource for exploring e-commerce trends, understanding customer behavior, and driving business growth. By leveraging this dataset effectively, businesses can unlock actionable insights, optimize operations, and stay ahead in today's competitive e-commerce landscape
R
Data Analytics 2 Dataset
universe.roboflow.com
zip
Updated Jan 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRAFFIC SIGNS DATA ANALYTICS (2025). Data Analytics 2 Dataset [Dataset]. https://universe.roboflow.com/traffic-signs-data-analytics/data-analytics-2
Explore at:
zipAvailable download formats
Dataset updated
Jan 1, 2025
Dataset authored and provided by
TRAFFIC SIGNS DATA ANALYTICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
TRAFFIC LIGHTS Gztl Bounding Boxes
Description
DATA ANALYTICS 2

## Overview DATA ANALYTICS 2 is a dataset for object detection tasks - it contains TRAFFIC LIGHTS Gztl annotations for 8,579 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
New 1000 Sales Records Data 2
kaggle.com
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
Explore at:
zip(49305 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
Calvin Oko Mensah
Description
This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
m
lipidomics dataset-Lipid droplet plasticity and dispersion dictates...
data.mendeley.com
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sahar motamedi (2023). lipidomics dataset-Lipid droplet plasticity and dispersion dictates differential sensitivity of therapy-resistant melanoma cells to PUFA and iron-induced ferroptosis [Dataset]. http://doi.org/10.17632/3c755wtx8w.1
Explore at:
Unique identifier
https://doi.org/10.17632/3c755wtx8w.1
Dataset updated
Jun 9, 2023
Authors
sahar motamedi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Lipidmics data generated for the manuscript (Lipid droplet plasticity and dispersion dictates differential sensitivity of therapy-resistant melanoma cells to PUFA and iron-induced ferroptosis)
d
Data from: A simple method for statistical analysis of intensity differences...
catalog.data.gov
healthdata.gov
+1more
Updated Sep 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). A simple method for statistical analysis of intensity differences in microarray-derived gene expression data [Dataset]. https://catalog.data.gov/dataset/a-simple-method-for-statistical-analysis-of-intensity-differences-in-microarray-derived-ge
Explore at:
Dataset updated
Sep 7, 2025
Dataset provided by
National Institutes of Health
Description
Background Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. Results A simple statistical method for estimating variances from microarray control data which does not require multiple replicates is presented. Comparison of datasets from two commercial entities using this difference-averaging method demonstrates that the standard deviation of the signal scales at a level intermediate between the signal intensity and its square root. Application of the method to a dataset related to the β-catenin pathway yields a larger number of biologically reasonable genes whose expression is altered than the ratio method. Conclusions The difference-averaging method enables determination of variances as a function of signal intensities by averaging over the entire dataset. The method also provides a platform-independent view of important statistical properties of microarray data.
D
UnrealGaussianStat: Synthetic dataset for statistical analysis on Novel View...
dataverse.no
dataverse.azure.uit.no
+1more
txt, zip
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anurag Dalal; Anurag Dalal (2025). UnrealGaussianStat: Synthetic dataset for statistical analysis on Novel View Synthesis [Dataset]. http://doi.org/10.18710/WSU7I6
Explore at:
txt(7447), zip(960339536)Available download formats
Unique identifier
https://doi.org/10.18710/WSU7I6
Dataset updated
Apr 10, 2025
Dataset provided by
DataverseNO
Authors
Anurag Dalal; Anurag Dalal
License
https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6
Description
The dataset comprises three dynamic scenes characterized by both simple and complex lighting conditions. The quantity of cameras ranges from 4 to 512, including 4, 6, 8, 10, 12, 14, 16, 32, 64, 128, 256, and 512. The point clouds are randomly generated.
N
Allison, IA Annual Population and Growth Analysis Dataset: A Comprehensive...
neilsberg.com
csv, json
Updated Jul 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Allison, IA Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Allison from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/allison-ia-population-by-year/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 30, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Allison, Iowa
Variables measured
Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
Measurement technique
The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Allison population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Allison across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

Key observations

In 2023, the population of Allison was 942, a 1.15% decrease year-by-year from 2022. Previously, in 2022, Allison population was 953, a decline of 1.04% compared to a population of 963 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Allison decreased by 58. In this period, the peak population was 1,026 in the year 2010. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

Content

When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

Data Coverage:

From 2000 to 2023

Variables / Data Columns

Year: This column displays the data year (Measured annually and for years 2000 to 2023)

Population: The population for the specific year for the Allison is shown in this column.

Year on Year Change: This column displays the change in Allison population for each year compared to the previous year.

Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Allison Population by Year. You can refer the same here
Dataset for Computational Fluid Dynamics Analysis of a Micro-scale Chamber...
catalog.data.gov
s.cnmilf.com
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). Dataset for Computational Fluid Dynamics Analysis of a Micro-scale Chamber for Measuring Organic Chemical Emission Parameters [Dataset]. https://catalog.data.gov/dataset/dataset-for-computational-fluid-dynamics-analysis-of-a-micro-scale-chamber-for-measuring-o
Explore at:
Dataset updated
Dec 15, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The data presented in this data file is a product of journal publication. The dataset contains air velocity of testing chambers, and convective mass transfer coefficient, diffusion coefficient, and material / air partition coefficient of formaldehyde with different building materials. Portions of this dataset are inaccessible because: CFD data was generated and stored by North Carolina State University because the files are too big and need specific software to open. They can be accessed through the following means: Please contact Jack Edwards at North Carolina State University. Format: CFD modeling data files. This dataset is associated with the following publication: Edwards, J., C. Huang, and X. Liu. Computational Fluid Dynamics Analysis of a Micro-scale Chamber for Measuring Organic Chemical Emission Parameters. JOURNAL OF HAZARDOUS MATERIALS. Elsevier Science Ltd, New York, NY, USA, 0, (2024).
Digital_Payments_2025_Dataset
figshare.com
csv
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shreyash tiwari (2025). Digital_Payments_2025_Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28873229.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28873229.v1
Dataset updated
Apr 25, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
shreyash tiwari
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The "Digital Payments 2025 Dataset" is a synthetic dataset representing digital payment transactions across various payment applications in India for the year 2025. It captures monthly transaction data for multiple payment apps, including banks, UPI platforms, and mobile payment services, reflecting the growing adoption of digital payments in India. The dataset was created as part of a college project to simulate realistic transaction patterns for research, education, and analysis in data science, economics, and fintech studies. It includes metrics such as customer transaction counts and values, total transaction counts and values, and temporal data (month and year). The data is synthetic, generated using Python libraries to mimic real-world digital payment trends, and is suitable for academic research, teaching, and exploratory data analysis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1

Orange dataset table

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.19146410.v1

Dataset updated

Mar 4, 2022

Dataset provided by

Figsharehttp://figshare.com/
figshare

Authors

Rui Simões

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

Clear search

Close search

Google apps

Main menu

Orange dataset table

[Data Analysis/2021] Mix Dataset

Dataset

Contents

Walmart Products Dataset – Free Product Data CSV

Key Features

Who Benefits?

Why Use This Dataset Instead of Manual Scraping?

Comprehensive Supply Chain Analysis

Google Analytics data of an E-commerce Company

📊 Dataset Title: Daily Active Users Dataset

📝 Description

📂 Dataset Structure

🧐 Key Use Cases

📈 Potential Analysis

🚀 Getting Started

Data from: Social Media Data Analysis

Dataset

Contents

Student Skill Gap Analysis

Descriptive statistics.

Insurance Dataset

Python Codes for Data Analysis of The Impact of COVID-19 on Technical...

Dataset

Exploring E-commerce Trends⭐️⭐️⭐️

Data Analytics 2 Dataset

DATA ANALYTICS 2

New 1000 Sales Records Data 2

lipidomics dataset-Lipid droplet plasticity and dispersion dictates...

Data from: A simple method for statistical analysis of intensity differences...

UnrealGaussianStat: Synthetic dataset for statistical analysis on Novel View...

Allison, IA Annual Population and Growth Analysis Dataset: A Comprehensive...

About this dataset

Content

Inspiration

Recommended for further research

Dataset for Computational Fluid Dynamics Analysis of a Micro-scale Chamber...

Digital_Payments_2025_Dataset

Orange dataset table