Facebook
TwitterThis dataset was created by Pinky Verma
Facebook
TwitterThis is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.
Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.
CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.
Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.
Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.
Facebook
TwitterThis dataset is a sample from the TalkingData AdTracking competition. I kept all the positive examples (where is_attributed == 1), while discarding 99% of the negative samples. The sample has roughly 20% positive examples.
For this competition, your objective was to predict whether a user will download an app after clicking a mobile app advertisement.
train_sample.csv - Sampled data
Each row of the training data contains a click record, with the following features.
ip: ip address of click.app: app id for marketing.device: device type id of user mobile phone (e.g., iphone 6 plus, iphone 7, huawei mate 7, etc.)os: os version id of user mobile phonechannel: channel id of mobile ad publisherclick_time: timestamp of click (UTC)attributed_time: if user download the app for after clicking an ad, this is the time of the app downloadis_attributed: the target that is to be predicted, indicating the app was downloadedNote that ip, app, device, os, and channel are encoded.
I'm also including Parquet files with various features for use within the course.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Part of the dataset supplied in: https://www.kaggle.com/datasets/wordsforthewise/lending-club
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Gratis by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Gratis across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 50.0% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Gratis Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The description section is crucial for helping users understand the purpose, context, and potential applications of your dataset. It should include the following details:
This section provides details about the files included in your dataset, helping users navigate and use them efficiently. Key points to include:
mars_rover_dataset.csv (CSV file containing metadata of images) mars_images.zip (Compressed folder containing all images) img_src column in mars_rover_dataset.csv corresponds to the images stored in mars_images.zip. Users should extract the images before using the dataset for model training." bash
unzip mars_images.zipThis section explains the meaning of each column in the dataset, ensuring users can analyze and interpret the data correctly. A well-structured table format is often useful:
| Column Name | Description |
|---|---|
id | Unique identifier for each image. |
sol | Martian sol (day) when the image was captured. |
camera_name | Abbreviated name of the rover's camera (e.g., "FHAZ" for Front Hazard Camera). |
camera_full_name | Full descriptive name of the camera. |
img_src | URL link to the image. Users can download images using this link. |
earth_date | The Earth date corresponding to the Martian sol. |
rover_name | Name of the rover that captured the image (e.g., "Curiosity"). |
rover_status | Current operational status of the rover (e.g., "Active" or "Complete"). |
landing_date | Date when the rover landed on Mars. |
launch_date | Date when the rover was launched from Earth. |
earth_date is in YYYY-MM-DD format. This section helps users quickly understand the dataset's structure, making it easier for them to work with the data effectively.
Facebook
TwitterThis dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary
Facebook
TwitterThis dataset supports the SWAMP Data Dashboard, a public-facing tool developed by the Surface Water Ambient Monitoring Program (SWAMP) to provide accessible, user-friendly access to water quality monitoring data across California. The dashboard and its associated datasets are designed to help the public, researchers, and decision-makers explore and download monitoring data collected from California’s surface waters.
This dataset includes five distinct resources:
These data are collected by SWAMP and its partners to support water quality assessments, identify trends, and inform water resource management. The SWAMP Data Dashboard provides interactive visualizations and filtering tools to explore this data by region, parameter, and more.
The SWAMP dataset is sourced from the California Environmental Data Exchange Network (CEDEN), which serves as the central repository for water quality data collected by various monitoring programs throughout the state. As such, there is some overlap between this dataset and the broader CEDEN datasets also published on the California Open Data Portal (see Related Resources). This SWAMP dataset represents a curated subset of CEDEN data, specifically tailored for use in the SWAMP Data Dashboard.
Access the SWAMP Data Dashboard: https://gispublic.waterboards.ca.gov/swamp-data/
*This dataset is provisional and subject to revision. It should not be used for regulatory purposes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Sample home depot dataset included more than 3500+ records Total Fields: 13 Format: CSV Fields: url, title, images, description, product_id, sku, gtin13, brand, price, currency, availability, uniq_id, scraped_at
Crawl Feeds team extracted data from the home depot. Download complete dataset with more than 1 million+ products in csv format
The Home depot dataset useful for research and analysis purposes
Facebook
TwitterThis data set contains a number of variables from collected on children and their parents who took part in the SMILE trial at assessment and follow up. It does not include data on age and gender as we want to be certain that no child or parent can be identified through the data. Researchers can apply to access a fuller data set (https://data.bris.ac.uk/data/dataset/1myzti8qnv48g2sxtx6h5nice7) containing age and gender through application to the University of Bristol's Data Access Committee, please refer to the data access request form (http://bit.ly/data-bris-request) for details on how to apply for access. Complete download (zip, 1.5 MiB)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Customized Dataset is a dataset for object detection tasks - it contains Objects annotations for 779 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Animal Faces Dataset is a collection of animal face images across multiple species, designed for AI, machine learning, and computer vision applications such as wildlife monitoring and image recognition.
Facebook
TwitterDescription of the INSPIRE Download Service (predefined Atom): Local municipality Bremm Development plan Auf der Buech (Klaerwerk) - The link(s) for downloading the data sets is/are dynamically generated from Get Map calls to a WMS interface
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the metadata of the datasets published in 118 Dataverse installations, information about the metadata blocks of 118 installations, and the lists of pre-defined licenses or dataset terms that depositors can apply to datasets in the 100 installations that were running versions of the Dataverse software that include the "multiple-license" feature. The data is useful for improving understandings about how certain Dataverse features and metadata fields are used and for learning about the quality of dataset and file-level metadata within and across Dataverse installations. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 25 and September 2, 2025 using a Python script that uses the Dataverse API. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation)_2025.08.25-2025.09.02.csv │ ├── contributor(citation)_2025.08.25-2025.09.02.csv │ ├── data_source(citation)_2025.08.25-2025.09.02.csv │ ├── ... │ └── topic_classification(citation)_2025.08.25-2025.09.02.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2025.08.26_07.14.00.zip │ ├── dataset_pids_Abacus_2025.08.26_07.14.00.csv │ ├── Dataverse_JSON_metadata_2025.08.26_07.14.00 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.9 │ ├── astrophysics_v5.9.json │ ├── biomedical_v5.9.json │ ├── citation_v5.9.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2025.08.25_15.45.25.zip │ ├── ... │ └── Yale_Dataverse_2025.08.25_11.51.29.zip └── dataverse_installations_summary_2025.09.02.csv └── dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv └── license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv └── metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the "Citation" metadata block and "Geospatial" metadata block of datasets in the 118 Dataverse installations. For example, author(citation)_2025.08.25-2025.09.02.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in 118 installations, with a column for each of the four child fields: author name, affiliation, identifier type, and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 118 zip files, one zip file for each of the 118 Dataverse installations whose sites were functioning when I attempted to collect their metadata and that have at least one published dataset. Each zip file contains: A CSV file listing information about the datasets published in the installation, including a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. A directory with JSON files that have information about the installation's metadata fields, such as the field names and how they're organized. A directory of JSON files that contain the metadata of the installation's published, non-deaccessioned dataset versions in the Dataverse JSON metadata schema. The dataverse_installations_summary_2025.09.02.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata included and not included in this dataset. The dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv file contains the dataset PIDs of published datasets in 118 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all "dataset_pids_....csv" files in each of the 118 zip files in the dataverse_json_metadata_from_each_known_dataverse_installation directory. The license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv file contains information about the licenses and data use agreements that some installations let depositors choose when creating datasets. When I collected this data, 100 of the available 118 installations were running versions of the Dataverse software that allow depositors to choose a "predefined license or data use agreement" from a dropdown menu in the dataset deposit form. For more information about this Dataverse feature, see https://guides.dataverse.org/en/6.7/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv file contains the metadata block names, field names, child field names (if the field is a compound field), display names, descriptions/tooltip text, watermarks, and controlled vocabulary values of fields in the 118 Dataverse installations' metadata blocks. This file is useful for learning...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.
Key observations
The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
Facebook
Twitterhttps://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/
Africa Rural Population Dataset
Dataset Summary
This dataset provides annual rural population counts for 54 African countries from 1960 to 2024.The data originates from the World Bank Development Indicators (indicator code SP.RUR.TOTL) and has been cleaned and re-formatted for machine-learning workflows.
Source & Collection
Original source: World Bank Open Data – Rural population (SP.RUR.TOTL)Data accessed via Excel download and processed on 2025-08-07.… See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/Africa-Rural-Population-Dataset.
Facebook
TwitterLarge Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Wadsworth by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Wadsworth across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 52.19% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Wadsworth Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterThis dataset was created by Pinky Verma