100+ datasets found

CSV file used in statistical analyses
data.csiro.au
researchdata.edu.au
+1more
Updated Oct 13, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
Explore at:
Unique identifier
https://doi.org/10.4225/08/543B4B4CA92E6
Dataset updated
Oct 13, 2014
Dataset authored and provided by
CSIROhttp://www.csiro.au/
License
https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
Time period covered
Mar 14, 2008 - Jun 9, 2009
Dataset funded by
CSIROhttp://www.csiro.au/
Description
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.
Online Sales Dataset - Popular Marketplace Data
kaggle.com
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ShreyanshVerma27 (2024). Online Sales Dataset - Popular Marketplace Data [Dataset]. https://www.kaggle.com/datasets/shreyanshverma27/online-sales-dataset-popular-marketplace-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ShreyanshVerma27
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.

Columns:

Order ID: Unique identifier for each sales order.

Date:Date of the sales transaction.

Category:Broad category of the product sold (e.g., Electronics, Home Appliances, Clothing, Books, Beauty Products, Sports).

Product Name:Specific name or model of the product sold.

Quantity:Number of units of the product sold in the transaction.

Unit Price:Price of one unit of the product.

Total Price: Total revenue generated from the sales transaction (Quantity * Unit Price).

Region:Geographic region where the transaction occurred (e.g., North America, Europe, Asia).

Payment Method: Method used for payment (e.g., Credit Card, PayPal, Debit Card).

Insights:

1. Analyze sales trends over time to identify seasonal patterns or growth opportunities.

2. Explore the popularity of different product categories across regions.

3. Investigate the impact of payment methods on sales volume or revenue.

4. Identify top-selling products within each category to optimize inventory and marketing strategies.

5. Evaluate the performance of specific products or categories in different regions to tailor marketing campaigns accordingly.
a
Skills Building - Add a CSV file to a map
resources-gisinschools-nz.hub.arcgis.com
gisinschools.eagle.co.nz
Updated Jun 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GIS in Schools - Teaching Materials - New Zealand (2020). Skills Building - Add a CSV file to a map [Dataset]. https://resources-gisinschools-nz.hub.arcgis.com/documents/c45f392466254ce4a24be98a15c8193c
Explore at:
Dataset updated
Jun 2, 2020
Dataset authored and provided by
GIS in Schools - Teaching Materials - New Zealand
Description
Instructions on how to create a layer containing recent earthquakes from a CSV file downloaded from GNS Sciences GeoNet website to a Web Map.The CSV file must contain latitude and longitude fields for the earthquake location for it to be added to a Web Map as a point layer.Document designed to support the Natural Hazards - Earthquakes story map
A
Mapping incident locations from a CSV file in a web map (video)
data.amerigeoss.org
esri rest, html
Updated Mar 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ESRI (2020). Mapping incident locations from a CSV file in a web map (video) [Dataset]. https://data.amerigeoss.org/zh_CN/dataset/mapping-incident-locations-from-a-csv-file-in-a-web-map-video
Explore at:
esri rest, htmlAvailable download formats
Dataset updated
Mar 17, 2020
Dataset provided by
ESRI
Description
Mapping incident locations from a CSV file in a web map (YouTube video).

View this short demonstration video to learn how to geocode incident locations from a spreadsheet in ArcGIS Online. In this demonstration, the presenter drags a simple .csv file into a browser-based Web Map and maps the appropriate address fields to display incident points allowing different types of spatial overlays and analysis.

_

Communities around the world are taking strides in mitigating the threat that COVID-19 (coronavirus) poses. Geography and location analysis have a crucial role in better understanding this evolving pandemic.

When you need help quickly, Esri can provide data, software, configurable applications, and technical support for your emergency GIS operations. Use GIS to rapidly access and visualize mission-critical information. Get the information you need quickly, in a way that’s easy to understand, to make better decisions during a crisis.

Esri’s Disaster Response Program (DRP) assists with disasters worldwide as part of our corporate citizenship. We support response and relief efforts with GIS technology and expertise.

More information...
m
Network traffic and code for machine learning classification
data.mendeley.com
Updated Feb 20, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Víctor Labayen (2020). Network traffic and code for machine learning classification [Dataset]. http://doi.org/10.17632/5pmnkshffm.2
Explore at:
Unique identifier
https://doi.org/10.17632/5pmnkshffm.2
Dataset updated
Feb 20, 2020
Authors
Víctor Labayen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.

Activities:

Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.

The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.

The amount of data is stated as follows:

Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes

The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.
Suduko Image with Solution
kaggle.com
Updated Jan 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amar jeet kushwaha (2022). Suduko Image with Solution [Dataset]. https://www.kaggle.com/datasets/amarlove/suduko-image-with-solution/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
amar jeet kushwaha
License
http://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html
Description
Dataset

This dataset was created by amar jeet kushwaha

Released under GNU Free Documentation License 1.3

Contents
Waitrose Products Information Dataset in CSV Format - Comprehensive Product...
crawlfeeds.com
csv, zip
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Waitrose Products Information Dataset in CSV Format - Comprehensive Product Data [Dataset]. https://crawlfeeds.com/datasets/waitrose-products-information-dataset-in-csv-format-comprehensive-product-data
Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 7, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Waitrose Product Dataset offers a comprehensive and structured collection of grocery items listed on the Waitrose online platform. This dataset includes 25,000+ product records across multiple categories, curated specifically for use in retail analytics, pricing comparison, AI training, and eCommerce integration.

Each record contains detailed attributes such as:

Product title, brand, MPN, and product ID

Price and currency

Availability status

Description, ingredients, and raw nutrition data

Review count and average rating

Breadcrumbs, image links, and more

Delivered in CSV format (ZIP archive), this dataset is ideal for professionals in the FMCG, retail, and grocery tech industries who need structured, crawl-ready data for their projects.
i
Classification of online health messages - Dataset - CKAN
rdm.inesctec.pt
Updated Jul 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Classification of online health messages - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2022-008
Explore at:
Dataset updated
Jul 6, 2022
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Classification of online health messages The dataset has 487 annotated messages taken from Medhelp, an online health forum with several health communities (https://www.medhelp.org/). It was built in a master thesis entitled "Automatic categorization of health-related messages in online health communities" of the Master in Informatics and Computing Engineering of the Faculty of Engineering of the University of Porto. It expands a dataset created in a previous work [see Relation metadata] whose objective was to propose a classification scheme to analyze messages exchanged in online health forums. A website was built to allow the classification of additional messages collected from Medhelp. After using a Python script to scrape the five most recent discussions from popular forums (https://www.medhelp.org/forums/list), we sampled 285 messages from them to annotate. Each message was classified three times by anonymous people in 11 categories from April 2022 until the end of May 2022. For each message, the rater picked the categories associated with the message and its emotional polarity (positive, neutral, and negative). Our dataset is organized in two CSV files, one containing information regarding the 885 (=3*285) classifications collected via crowdsourcing (CrowdsourcingClassification.csv) and the other containing the 487 messages with their final and consensual classifications (FinalClassification.csv). The readMe file provides detailed information about the two .csv files.
European Social Survey: round 9. Ed. 1.1.
kaggle.com
zip
Updated Dec 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan Mikhnenkov (2019). European Social Survey: round 9. Ed. 1.1. [Dataset]. https://www.kaggle.com/ivanmikhnenkov/european-social-survey-round-9-ed-11
Explore at:
zip(8032559 bytes)Available download formats
Dataset updated
Dec 6, 2019
Authors
Ivan Mikhnenkov
Description
Dataset

This dataset was created by Ivan Mikhnenkov

Contents

It contains the following files:
B
Residential School Locations Dataset (CSV Format)
borealisdata.ca
search.dataone.org
Updated Jun 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/RIYEMU
Dataset updated
Jun 5, 2019
Dataset provided by
Borealis
Authors
Rosa Orlandini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1863 - Jun 30, 1998
Area covered
Canada
Description
The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.
d
Replication Data for: Revisiting 'The Rise and Decline' in a Population of...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill (2023). Replication Data for: Revisiting 'The Rise and Decline' in a Population of Peer Production Projects [Dataset]. http://doi.org/10.7910/DVN/SG3LP1
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SG3LP1
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill
Description
This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the int... Visit https://dataone.org/datasets/sha256%3Acfa4980c107154267d8eb6dc0753ed0fde655a73a062c0c2f5af33f237da3437 for complete metadata about this dataset.
AOI polygon fire statistics CSV files
nwcc-nrcs.hub.arcgis.com
hub.arcgis.com
Updated Nov 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA NRCS ArcGIS Online (2024). AOI polygon fire statistics CSV files [Dataset]. https://nwcc-nrcs.hub.arcgis.com/datasets/6928853d9d7c450a84984d4c66f95e9c
Explore at:
Dataset updated
Nov 24, 2024
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Natural Resources Conservation Servicehttp://www.nrcs.usda.gov/
Authors
USDA NRCS ArcGIS Online
Description
Annual and time-period fire statistics in CSV format for the AOIs of the NWCC active forecast stations. The statistics are based on NIFC fire historical and current perimeters and MTBS burn severity data. This release contains NIFC data from 1996 to current (July 10, 2025) and MTBS data from 1996 to 2022. Annual statsitics were generated for the time period of 1996 to 2025. Time-period statistics were generated from 1998 to 2022 with a 5 years time interval. The time periods are: 2018-2022 (last 5 years), 2013-2022 (last 10 years), 2008-2022 (last 15 years), 2003-2022 (last 20 years), and 1998-2022 (last 25 years).
f
fluTwitterData.csv: Data file containing weekly ILI and tweet counts from A...
rs.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lewis Mitchell; Joshua V. Ross (2023). fluTwitterData.csv: Data file containing weekly ILI and tweet counts from A data-driven model for influenza transmission incorporating media effects [Dataset]. http://doi.org/10.6084/m9.figshare.4021752.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4021752.v1
Dataset updated
May 31, 2023
Dataset provided by
The Royal Society
Authors
Lewis Mitchell; Joshua V. Ross
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Numerous studies have attempted to model the effect of mass media on the transmission of diseases such as influenza, however, quantitative data on media engagement has until recently been difficult to obtain. With the recent explosion of ‘big data’ coming from online social media and the like, large volumes of data on a population’s engagement with mass media during an epidemic are becoming available to researchers. In this study, we combine an online dataset comprising millions of shared messages relating to influenza with traditional surveillance data on flu activity to suggest a functional form for the relationship between the two. Using this data, we present a simple deterministic model for influenza dynamics incorporating media effects, and show that such a model helps explain the dynamics of historical influenza outbreaks. Furthermore, through model selection we show that the proposed media function fits historical data better than other media functions proposed in earlier studies.
Udemy Courses
kaggle.com
Updated Nov 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hossain (2022). Udemy Courses [Dataset]. https://www.kaggle.com/datasets/hossaingh/udemy-courses
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 21, 2022
Dataset provided by
Kaggle
Authors
Hossain
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset contains detailed information on all available Udemy courses on Oct 10, 2022. This data was provided in the "Course_info.csv" file. Also, over 9 million comments were collected and provided in the "Comments.csv" file. The information of over 209k courses was collected by web scraping the Udemy website. Udemy holds 209,734 courses and 73,514 instructors teaching courses in 79 languages in 13 different categories.

The related notebook was uploaded here. If you are interested in analytical data about online learning platforms, I recommend reading the below article to find attractive insight. https://lnkd.in/gjCBhP_P
Furniture E-commerce Dataset – 140K+ Product Records with Categories &...
crawlfeeds.com
csv, zip
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Furniture E-commerce Dataset – 140K+ Product Records with Categories & Breadcrumbs (CSV for AI & NLP) [Dataset]. https://crawlfeeds.com/datasets/furniture-e-commerce-dataset-140k-product-records-with-categories-breadcrumbs-csv-for-ai-nlp
Explore at:
zip, csvAvailable download formats
Dataset updated
Aug 20, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
This furniture e-commerce dataset includes 140,000+ structured product records collected from online retail sources. Each entry provides detailed product information, categories, and breadcrumb hierarchies, making it ideal for AI, machine learning, and analytics applications.

Key Features:

📊 140K+ furniture product records in structured format

🏷 Includes categories, subcategories, and breadcrumbs for taxonomy mapping

📂 Delivered as a clean CSV file for easy integration

🔎 Perfect dataset for AI, NLP, and machine learning model training

Best Use Cases:
✔ LLM training & fine-tuning with domain-specific data
✔ Product classification datasets for AI models
✔ Recommendation engines & personalization in e-commerce
✔ Market research & furniture retail analytics
✔ Search optimization & taxonomy enrichment

Why this dataset?

Large volume (140K+ furniture records) for robust training

Real-world e-commerce product data

Ready-to-use CSV, saving preprocessing time

Affordable licensing with bulk discounts for enterprise buyers

Note:
Each record in this dataset includes both a url (main product page) and a buy_url (the actual purchase page).
The dataset is structured so that records are based on the buy_url, ensuring you get unique, actionable product-level data instead of just generic landing pages.
m
Covid-19 Online Articles Dataset
data.mendeley.com
Updated Mar 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Kofi Akpatsa (2021). Covid-19 Online Articles Dataset [Dataset]. http://doi.org/10.17632/r6nn5s37tp.2
Explore at:
Unique identifier
https://doi.org/10.17632/r6nn5s37tp.2
Dataset updated
Mar 24, 2021
Authors
Samuel Kofi Akpatsa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a collection of articles about Covid-19 published online from May 2020 to September 2020 and stored as a CSV file. The primary providers of these articles are 10news.com, cnn.com, and foxla.com. The dataset contains two columns (text and sentiment). The text column contains text from the articles to which a label applies. The sentiment column contains either the value 1 (positive class) for text with positive sentiment or the value 0 (negative class) for text with negative sentiment. The model used will be published in one of the journals later and will be found on my profile with the title: 'Sentiment Analysis of Covid-19 Articles; The Impact of Bidirectional Layer on Long Short-Term Memory (LSTM).'
m
ShoppingAppReviews Dataset
data.mendeley.com
Updated Sep 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noor Mairukh Khan Arnob (2024). ShoppingAppReviews Dataset [Dataset]. http://doi.org/10.17632/chr5b94c6y.2
Explore at:
Unique identifier
https://doi.org/10.17632/chr5b94c6y.2
Dataset updated
Sep 16, 2024
Authors
Noor Mairukh Khan Arnob
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A dataset consisting of 751,500 English app reviews of 12 online shopping apps. The dataset was scraped from the internet using a python script. This ShoppingAppReviews dataset contains app reviews of the 12 most popular online shopping android apps: Alibaba, Aliexpress, Amazon, Daraz, eBay, Flipcart, Lazada, Meesho, Myntra, Shein, Snapdeal and Walmart. Each review entry contains many metadata like review score, thumbsupcount, review posting time, reply content etc. The dataset is organized in a zip file, under which there are 12 json files and 12 csv files for 12 online shopping apps. This dataset can be used to obtain valuable information about customers' feedback regarding their user experience of these financially important apps.
Effects of community management on user activity in online communities
zenodo.org
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Cottica; Alberto Cottica (2025). Effects of community management on user activity in online communities [Dataset]. http://doi.org/10.5281/zenodo.1320261
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1320261
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Cottica; Alberto Cottica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.

Instructions:

Unzip the files.

Start with JSON files obtained from calling platform APIs: each dataset consists of one file for posts, one for comments, one for users. In the paper we use two datasets, one referring Edgeryders, the other to Matera 2019.

Run them through edgesense (https://github.com/edgeryders/edgesense). Edgesense allows to set the length of the observation period. We set it to 1 week and 1 day for Edgeryders data, and to 1 day for Matera 2019 data. Edgesense stores its results in a file called JSON network.min.json, which we then rename to keep track of the data source and observation length.

Launch Jupyter Notebook and run the notebook provided to convert the network.min.json files into CSV flat files, one for each netwrk file

Launch Stata and open each flat csv files with it, then save it in Stata format.

Use the provided Stata .do scripts to replicate results.

Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.
Z
Need for Explanations - Survey - Online Material
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chazette, Larissa (2020). Need for Explanations - Survey - Online Material [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3261127
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Chazette, Larissa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the results of an online questionnaire to assess the end-users' need for explanations in software systems. The questionnaire was shared in December 2018 and was online until January 2019. 171 participants initiate the survey and 107 completed it. We just analyzed the responses of the participants who completed the survey.

This submission contains:

The survey raw data in CSV format, separated by comma values;

The .xlsx file containing the same raw data;

The .pdf file containing the survey questions;

A .rtfd version of the survey questions;

A .html version of the survey questions;

The .xlsx file containing the analyzed data;

The .pdf file containing instructions about the coded data.

The raw data contains only the responses from the 107 participants who completed the survey. Blank cells indicate that the participant did not provide a response to the corresponding question or answer option.

All responses are anonymyzed and identified by an unique ID.

Each row is identified by the participant's ID, the date when the questionnaire was submitted, the last page (18 in total) and the language that the participant chose.

The subsequent columns contain the questions.

We use codes before each question. First, one of the following symbols:

(*) as an indication that the question was mandatory;

(*+)as an indication that the question was mandatory but was conditionnally shown, depending on previous answers;

(+) as an indication that the question was conditionally shown, depending on previous answers;

Next, the code of the question as in the questionnaire.

And, if multiple choice, the code of the answer option.

E.g.: (*+)A2(3) means that the A2 question in the questionnaire was mandatory and conditionally shown, and that this column contains the responses regarding answer option 3.

After this code, the question as on the original questionnaire is shown and, when multiple option answer, the corresponding option is shown between [] after the question. E.g.: "In a typical day, which category of software/apps do you use on your digital devices most often? (More than one allowed) [Games]", where Games was one of the optional answers.

The questionnaire was available in three languages: Portuguese, German and English.

Responses in German and Portuguese were translated to English. These translations are shown in a subsequent column, beside the column with the original responses, and are identified by the word "TRANSLATION" in the title. Responses which were already in English were not translated.
Cheltenham's Facebook Groups
kaggle.com
zip
Updated Apr 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Chirico (2018). Cheltenham's Facebook Groups [Dataset]. https://www.kaggle.com/datasets/mchirico/cheltenham-s-facebook-group
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 2, 2018
Authors
Mike Chirico
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Facebook is becoming an essential tool for more than just family and friends. Discover how Cheltenham Township (USA), a diverse community just outside of Philadelphia, deals with major issues such as the Bill Cosby trial, everyday traffic issues, sewer I/I problems and lost cats and dogs. And yes, theft.

Communities work when they're connected and exchanging information. What and who are the essential forces making a positive impact, and when and how do conversational threads get directed or misdirected?

Use Any Facebook Public Group

You can leverage the examples here for any public Facebook group. For an example of the source code used to collect this data, and a quick start docker image, take a look at the following project: facebook-group-scrape.

Data Sources

There are 4 csv files in the dataset, with data from the following 5 public Facebook groups:

Unofficial Cheltenham Township

Elkins Park Happenings!

Free Speech Zone

Cheltenham Lateral Solutions

Cheltenham Township Residents

post.csv

These are the main posts you will see on the page. It might help to take a quick look at the page. Commas in the msg field have been replaced with {COMMA}, and apostrophes have been replaced with {APOST}.

gid Group id (5 different Facebook groups)

pid Main Post id

id Id of the user posting

name User's name

timeStamp

shares

url

msg Text of the message posted.

likes Number of likes

comment.csv

These are comments to the main post. Note, Facebook postings have comments, and comments on comments.

gid Group id

pid Matches Main Post identifier in post.csv

cid Comment Id.

timeStamp

id Id of user commenting

name Name of user commenting

rid Id of user responding to first comment

msg Message

like.csv

These are likes and responses. The two keys in this file (pid,cid) will join to post and comment respectively.

gid Group id

pid Matches Main Post identifier in post.csv

cid Matches Comments id.

response Response such as LIKE, ANGRY etc.

id The id of user responding

name Name of the user responding

member.csv

These are all the members in the group. Some members never, or rarely, post or comment. You may find multiple entries in this table for the same person. The name of the individual never changes, but they change their profile picture. Each profile picture change is captured in this table. Facebook gives users a new id in this table when they change their profile picture.

gid Group id

id Id of the member

name Name of the member

url URL of the member

Facebook

Twitter

Click to copy link

Link copied

Cite

CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6

CSV file used in statistical analyses

Explore at:

Unique identifier

https://doi.org/10.4225/08/543B4B4CA92E6

Dataset updated

Oct 13, 2014

Dataset authored and provided by

CSIROhttp://www.csiro.au/

License

https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

Time period covered

Mar 14, 2008 - Jun 9, 2009

Dataset funded by

CSIROhttp://www.csiro.au/

Description

A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

Clear search

Close search

Google apps

Main menu

CSV file used in statistical analyses

Online Sales Dataset - Popular Marketplace Data

Columns:

Insights:

Skills Building - Add a CSV file to a map

Mapping incident locations from a CSV file in a web map (video)

Network traffic and code for machine learning classification

Suduko Image with Solution

Dataset

Contents

Waitrose Products Information Dataset in CSV Format - Comprehensive Product...

Classification of online health messages - Dataset - CKAN

European Social Survey: round 9. Ed. 1.1.

Dataset

Contents

Residential School Locations Dataset (CSV Format)

Replication Data for: Revisiting 'The Rise and Decline' in a Population of...

AOI polygon fire statistics CSV files

fluTwitterData.csv: Data file containing weekly ILI and tweet counts from A...

Udemy Courses

Furniture E-commerce Dataset – 140K+ Product Records with Categories &...

Covid-19 Online Articles Dataset

ShoppingAppReviews Dataset

Effects of community management on user activity in online communities

Need for Explanations - Survey - Online Material

Cheltenham's Facebook Groups

CSV file used in statistical analysesSee More Versions

CSV file used in statistical analyses