100+ datasets found

CSV file used in statistical analyses
data.csiro.au
researchdata.edu.au
+1more
Updated Oct 13, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
Explore at:
Unique identifier
https://doi.org/10.4225/08/543B4B4CA92E6
Dataset updated
Oct 13, 2014
Dataset authored and provided by
CSIROhttp://www.csiro.au/
License
https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
Time period covered
Mar 14, 2008 - Jun 9, 2009
Dataset funded by
CSIROhttp://www.csiro.au/
Description
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.
GitTables 1M - CSV files
zenodo.org
zip
Updated Jun 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M - CSV files [Dataset]. http://doi.org/10.5281/zenodo.6515973
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6515973
Dataset updated
Jun 6, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains >800K CSV files behind the GitTables 1M corpus.

For more information about the GitTables corpus, visit:

- our website for GitTables, or

- the main GitTables download page on Zenodo.
Adventure Works 2022 CSVs
kaggle.com
zip
Updated Nov 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
Explore at:
zip(567646 bytes)Available download formats
Dataset updated
Nov 2, 2022
Authors
Algorismus
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Adventure Works 2022 dataset

How this Dataset is created?

On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

How this Dataset may help you?

this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

How to use this Dataset?

Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
emp-data-csv-File
kaggle.com
zip
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dilip Srivastava (2024). emp-data-csv-File [Dataset]. https://www.kaggle.com/datasets/dilipkrsrivastava/emp-data
Explore at:
zip(6068 bytes)Available download formats
Dataset updated
Aug 2, 2024
Authors
Dilip Srivastava
Description
Dataset

This dataset was created by Dilip Srivastava

Contents
B
Residential School Locations Dataset (CSV Format)
borealisdata.ca
search.dataone.org
Updated Jun 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/RIYEMU
Dataset updated
Jun 5, 2019
Dataset provided by
Borealis
Authors
Rosa Orlandini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1863 - Jun 30, 1998
Area covered
Canada
Description
The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
q
Data repository sample names and codes (.csv file)
data.researchdatafinder.qut.edu.au
Updated Jun 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Data repository sample names and codes (.csv file) [Dataset]. https://data.researchdatafinder.qut.edu.au/dataset/measuring-the-interactions4/resource/8d4f9a99-02cf-4c61-a9ca-29bb7b2f2e93
Explore at:
Dataset updated
Jun 20, 2024
License
http://researchdatafinder.qut.edu.au/display/n9373http://researchdatafinder.qut.edu.au/display/n9373
Description
QUT Research Data Respository Dataset Resource available for download

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

zenodo.org

application/gzip, bin +2

Updated Aug 2, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788

Explore at:

bin, application/gzip, zip, text/x-pythonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1419788

Dataset updated

Aug 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb

License

https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

Description

Replication pack, FSE2018 submission #164:
------------------------------------------

**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
A Case Study of the PyPI Ecosystem

**Note:** link to data artifacts is already included in the paper. 
Link to the code will be included in the Camera Ready version as well.


Content description
===================

- **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
 described below
- **settings.py** - settings template for the code archive.
- **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
 This dataset only includes stats aggregated by the ecosystem (PyPI)
- **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
 statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
 themselves, which take around 2TB.
- **build_model.r, helpers.r** - R files to process the survival data 
  (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
  `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
  **dataset_full_Jan_2018.tgz**)
- **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
- LICENSE - text of GPL v3, under which this dataset is published
- INSTALL.md - replication guide (~2 pages)

Replication guide
=================

Step 0 - prerequisites
----------------------

- Unix-compatible OS (Linux or OS X)
- Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
- R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)

Depending on detalization level (see Step 2 for more details):
- up to 2Tb of disk space (see Step 2 detalization levels)
- at least 16Gb of RAM (64 preferable)
- few hours to few month of processing time

Step 1 - software
----------------

- unpack **ghd-0.1.0.zip**, or clone from gitlab:

   git clone https://gitlab.com/user2589/ghd.git
   git checkout 0.1.0
 
 `cd` into the extracted folder. 
 All commands below assume it as a current directory.
  
- copy `settings.py` into the extracted folder. Edit the file:
  * set `DATASET_PATH` to some newly created folder path
  * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
- install docker. For Ubuntu Linux, the command is 
  `sudo apt-get install docker-compose`
- install libarchive and headers: `sudo apt-get install libarchive-dev`
- (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
 Without this dependency, you might get an error on the next step, 
 but it's safe to ignore.
- install Python libraries: `pip install --user -r requirements.txt` . 
- disable all APIs except GitHub (Bitbucket and Gitlab support were
 not yet implemented when this study was in progress): edit
 `scraper/init.py`, comment out everything except GitHub support
 in `PROVIDERS`.

Step 2 - obtaining the dataset
-----------------------------

The ultimate goal of this step is to get output of the Python function 
`common.utils.survival_data()` and save it into a CSV file:

  # copy and paste into a Python console
  from common import utils
  survival_data = utils.survival_data('pypi', '2008', smoothing=6)
  survival_data.to_csv('survival_data.csv')

Since full replication will take several months, here are some ways to speedup
the process:

####Option 2.a, difficulty level: easiest

Just use the precomputed data. Step 1 is not necessary under this scenario.

- extract **dataset_minimal_Jan_2018.zip**
- get `survival_data.csv`, go to the next step

####Option 2.b, difficulty level: easy

Use precomputed longitudinal feature values to build the final table.
The whole process will take 15..30 minutes.

- create a folder `

1000 Empirical Time series
figshare.com
bridges.monash.edu
+1more
png
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Fulcher (2023). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5436136.v10
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ben Fulcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.
m
Data from: Sample CSV file
mygeodata.cloud
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Sample CSV file [Dataset]. https://mygeodata.cloud/converter/asc-to-csv
Explore at:
Dataset updated
Jul 9, 2025
Description
Sample data in CSV - Comma Separated Values format available for download for testing purposes.
train csv file
kaggle.com
zip
Updated May 5, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emmanuel Arias (2018). train csv file [Dataset]. https://www.kaggle.com/datasets/eamanu/train
Explore at:
zip(33695 bytes)Available download formats
Dataset updated
May 5, 2018
Authors
Emmanuel Arias
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by Emmanuel Arias

Released under Database: Open Database, Contents: Database Contents

Contents
Human Resources.csv
figshare.com
csv
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anurag pardiash (2025). Human Resources.csv [Dataset]. http://doi.org/10.6084/m9.figshare.28780886.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28780886.v1
Dataset updated
Apr 11, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
anurag pardiash
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset titled Human Resources.csv contains anonymized employee data collected for internal HR analysis and research purposes. It includes fields such as employee ID, department, gender, age, job role, and employment status. The data can be used for workforce trend analysis, HR benchmarking, diversity studies, and training models in human resource analytics.The file is provided in CSV format (3.05 MB) and adheres to general data privacy standards, with no personally identifiable information (PII).Last updated: April 11, 2025. Uploaded by Anurag Pardiash.
D
Walmart data in CSV format
dataandsons.com
csv, zip
Updated Aug 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
crawl feeds (2022). Walmart data in CSV format [Dataset]. https://www.dataandsons.com/categories/product-lists/walmart-data-in-csv-format
Explore at:
csv, zipAvailable download formats
Dataset updated
Aug 15, 2022
Dataset provided by
Data & Sons
Authors
crawl feeds
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
Aug 15, 2022
Description
About this Dataset

Walmart data in CSV format extracted by crawl feeds team using in-house tools. Last extracted on 15 Aug 2022.

Category

Product Lists

Keywords

Walmart dataset,retail datasets,ecommerce datasets

Row Count

10

Price

Free
c
Walmart Dataset
crawlfeeds.com
csv, zip
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Walmart Dataset [Dataset]. https://crawlfeeds.com/datasets/walmart-dataset
Explore at:
csv, zipAvailable download formats
Dataset updated
Apr 26, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Walmart products sample dataset having 1000+ records in CSV format. Download monthly dataset for walmart data and it having around 100K+ records.

Get 50% discount for all datasets. Link
MS Teams Attendance Sheet
kaggle.com
zip
Updated Mar 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Al-Azawi (2022). MS Teams Attendance Sheet [Dataset]. https://www.kaggle.com/datasets/mohammadalazawi/ms-teams-attendance-sheet
Explore at:
zip(1420 bytes)Available download formats
Dataset updated
Mar 16, 2022
Authors
Mohammad Al-Azawi
Description
This is a sample of CSV files that can be downloaded from Microsoft Teams after meetings. As MS Teams was used lately in delivering classes in schools and universities, it was important to follow the attendance of the students, therefore, this dataset can be used in writing the code that analyses the attendance of the students.
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
US Real Estate
zenrows.com
csv
Updated Jun 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZenRows (2021). US Real Estate [Dataset]. https://www.zenrows.com/datasets/us-real-estate
Explore at:
csv(5,8MB)Available download formats
Dataset updated
Jun 27, 2021
Dataset provided by
ZenRows S.L.
Authors
ZenRows
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
United States
Description
High-quality, free real estate dataset from all around the United States, in CSV format. Over 10.000 records relevant to Real Estate investors, agents, and data scientists. We are working on complete datasets from a wide variety of countries. Don't hesitate to contact us for more information.
c
Largest Flipkart Product Listings
crawlfeeds.com
csv, zip
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Largest Flipkart Product Listings [Dataset]. https://crawlfeeds.com/datasets/flipkart-products-dataset
Explore at:
csv, zipAvailable download formats
Dataset updated
Mar 13, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Looking to enhance your data-driven projects? Our large Flipkart e-commerce dataset is now available in CSV format, offering a wealth of information across multiple product categories. Whether you need the complete dataset or a specific subset based on categories, we've got you covered.

What’s Included in the Flipkart Dataset?

Our dataset is meticulously curated to provide high-quality, reliable data for your e-commerce and AI projects. It includes detailed product information spanning various categories, such as:

Automotive Accessories

Baby Care Products

Mobiles & Accessories

Men's Fashion Dataset from Flipkart

Home Improvement Items

Beauty and Grooming Products

Footwear

Jewellery

Toys and Games

Health Care Supplies

Kitchen, Cookware & Serveware

Computers and Accessories

Audio & Video Equipment
…and many more!

Sample CSV File for Preview

A sample CSV file with 200 records is available for download after a quick signup. Use this sample to evaluate the data structure, quality, and relevance to your project requirements.

Why Choose Our Flipkart Product Dataset?

Customizable Subsets: Request a subset of data tailored to specific categories that suit your project needs.

Versatile Applications: Perfect for building recommendation engines, price comparison tools, inventory management systems, and market trend analysis.

Ease of Access: The dataset is available in CSV format for seamless integration into your workflows.

Diverse Categories: Covering everything from fashion and home decor to electronics and festive decor, this dataset offers unmatched variety.

How to Get the Flipkart Dataset?

Visit Crawl Feeds Data Request to request access to the complete dataset or a customized subset.
Event Logs CSV
figshare.com
rar
Updated Dec 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dina Bayomie (2019). Event Logs CSV [Dataset]. http://doi.org/10.6084/m9.figshare.11342063.v1
Explore at:
rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11342063.v1
Dataset updated
Dec 9, 2019
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Dina Bayomie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The event logs in CSV format. The dataset contains both correlated and uncorrelated logs
Cleaned Contoso Dataset
kaggle.com
zip
Updated Aug 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhanu (2023). Cleaned Contoso Dataset [Dataset]. https://www.kaggle.com/datasets/bhanuthakurr/cleaned-contoso-dataset
Explore at:
zip(487695063 bytes)Available download formats
Dataset updated
Aug 27, 2023
Authors
Bhanu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Data was imported from the BAK file found here into SQL Server, and then individual tables were exported as CSV. Jupyter Notebook containing the code used to clean the data can be found here

Version 6 has a some more cleaning and structuring that was noticed after importing in Power BI. Changes were made by adding code in python notebook to export new cleaned dataset, such as adding MonthNumber for sorting by month number, similar for WeekDayNumber.

Cleaning was done in python while also using SQL Server to quickly find things. Headers were added separately, ensuring no data loss.Data was cleaned for NaN, garbage values and other columns.

Facebook

Twitter

Click to copy link

Link copied

Cite

CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6

CSV file used in statistical analyses

Explore at:

Unique identifier

https://doi.org/10.4225/08/543B4B4CA92E6

Dataset updated

Oct 13, 2014

Dataset authored and provided by

CSIROhttp://www.csiro.au/

License

https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

Time period covered

Mar 14, 2008 - Jun 9, 2009

Dataset funded by

CSIROhttp://www.csiro.au/

Description

A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

Clear search

Close search

Google apps

Main menu

CSV file used in statistical analyses

GitTables 1M - CSV files

Adventure Works 2022 CSVs

Adventure Works 2022 dataset

How this Dataset is created?

How this Dataset may help you?

How to use this Dataset?

emp-data-csv-File

Dataset

Contents

Residential School Locations Dataset (CSV Format)

Datasets for Sentiment Analysis

Data repository sample names and codes (.csv file)

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

1000 Empirical Time series

Data from: Sample CSV file

train csv file

Dataset

Contents

Human Resources.csv

Walmart data in CSV format

About this Dataset

Category

Keywords

Row Count

Price

Walmart Dataset

MS Teams Attendance Sheet

Data Cleaning Sample

US Real Estate

Largest Flipkart Product Listings

What’s Included in the Flipkart Dataset?

Sample CSV File for Preview

Why Choose Our Flipkart Product Dataset?

How to Get the Flipkart Dataset?

Event Logs CSV

Cleaned Contoso Dataset

CSV file used in statistical analysesSee More Versions

CSV file used in statistical analyses