100+ datasets found
  1. GitTables 1M - CSV files

    • zenodo.org
    zip
    Updated Jun 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M - CSV files [Dataset]. http://doi.org/10.5281/zenodo.6515973
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains >800K CSV files behind the GitTables 1M corpus.

    For more information about the GitTables corpus, visit:

    - our website for GitTables, or

    - the main GitTables download page on Zenodo.

  2. CSV file used in statistical analyses

    • data.csiro.au
    • researchdata.edu.au
    • +1more
    Updated Oct 13, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
    Explore at:
    Dataset updated
    Oct 13, 2014
    Dataset authored and provided by
    CSIROhttp://www.csiro.au/
    License

    https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

    Time period covered
    Mar 14, 2008 - Jun 9, 2009
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

  3. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  4. c

    Walmart Dataset

    • crawlfeeds.com
    csv, zip
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Walmart Dataset [Dataset]. https://crawlfeeds.com/datasets/walmart-dataset
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Apr 26, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Walmart products sample dataset having 1000+ records in CSV format. Download monthly dataset for walmart data and it having around 100K+ records.

    Get 50% discount for all datasets. Link

  5. 1000 Empirical Time series

    • figshare.com
    • researchdata.edu.au
    png
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Fulcher (2023). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
    Explore at:
    pngAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ben Fulcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.

  6. B

    Residential School Locations Dataset (CSV Format)

    • borealisdata.ca
    • search.dataone.org
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2019
    Dataset provided by
    Borealis
    Authors
    Rosa Orlandini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1863 - Jun 30, 1998
    Area covered
    Canada
    Description

    The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.

  7. Chinook CSV Dataset

    • kaggle.com
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anurag Verma (2023). Chinook CSV Dataset [Dataset]. https://www.kaggle.com/datasets/anurag629/chinook-csv-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anurag Verma
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset is an export of the tables from the Chinook sample database into CSV files. The Chinook database contains information about a fictional digital media store, including tables for artists, albums, media tracks, invoices, customers, and more.

    The CSV file for each table contains the columns and all rows of data. The column headers match the table schema. Refer to the Chinook schema documentation for more details on each table and column.

    The files are encoded as UTF-8. The delimiter is a comma. Strings are quoted. Null values are represented by empty strings.

    Files

    1. albums.csv
    2. artists.csv
    3. customers.csv
    4. employees.csv
    5. genres.csv
    6. invoice_items.csv
    7. invoices.csv
    8. media_types.csv
    9. playlist_track.csv
    10. playlists.csv
    11. tracks.csv

    Usage

    This dataset can be used to analyze the Chinook store data. For example, you could build models on customer purchases, track listening patterns, identify trends in genres or artists,etc.

    The data is ideal for practicing Pandas, Numpy, PySpark, etc libraries. The database schema provides a realistic set of tables and relationships.

  8. m

    Network traffic for machine learning classification

    • data.mendeley.com
    Updated Feb 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Labayen Guembe (2020). Network traffic for machine learning classification [Dataset]. http://doi.org/10.17632/5pmnkshffm.1
    Explore at:
    Dataset updated
    Feb 12, 2020
    Authors
    Víctor Labayen Guembe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.

    Activities:

    Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.

    The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.

    The amount of data is stated as follows:

    Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes

  9. f

    Event Logs CSV

    • figshare.com
    rar
    Updated Dec 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dina Bayomie (2019). Event Logs CSV [Dataset]. http://doi.org/10.6084/m9.figshare.11342063.v1
    Explore at:
    rarAvailable download formats
    Dataset updated
    Dec 9, 2019
    Dataset provided by
    figshare
    Authors
    Dina Bayomie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The event logs in CSV format. The dataset contains both correlated and uncorrelated logs

  10. d

    Dataset metadata of known Dataverse installations

    • search.dataone.org
    • dataverse.harvard.edu
    • +1more
    Updated Nov 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautier, Julian (2023). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/DVN/DCDKZQ
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Gautier, Julian
    Description

    This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected ... Visit https://dataone.org/datasets/sha256%3Ad27d528dae8cf01e3ea915f450426c38fd6320e8c11d3e901c43580f997a3146 for complete metadata about this dataset.

  11. m

    Download CSV DB

    • maclookup.app
    json
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Download CSV DB [Dataset]. https://maclookup.app/downloads/csv-database
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 10, 2025
    Description

    Free, daily updated MAC prefix and vendor CSV database. Download now for accurate device identification.

  12. c

    Amazon India products dataset in CSV format

    • crawlfeeds.com
    csv, zip
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Amazon India products dataset in CSV format [Dataset]. https://crawlfeeds.com/datasets/amazon-india-products-dataset-in-csv-format
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    India
    Description

    Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.

    Dataset Features:

    • Product Details: Name, Brand, Category, and Unique ID

    • Pricing Information: Current Price, Discounted Price, and Currency

    • Availability & Ratings: Stock Status, Customer Ratings, and Reviews

    • Seller Information: Seller Name and Fulfillment Details

    • Additional Attributes: Product Description, Specifications, and Images

    Dataset Specifications:

    • Format: CSV

    • Number of Records: 50,000+

    • Delivery Time: 3 Days

    • Price: $149.00

    • Availability: Immediate

    This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.

  13. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  14. e

    ATOM Download Service for the RÚIAN data of feature hierarchy by the area of...

    • data.europa.eu
    wfs
    Updated Aug 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). ATOM Download Service for the RÚIAN data of feature hierarchy by the area of the country - CSV format [Dataset]. https://data.europa.eu/data/datasets/cz-00025712-cuzk_atom-md_ruian-csv-hie-st
    Explore at:
    wfsAvailable download formats
    Dataset updated
    Aug 29, 2020
    Description

    Download Service provides pre-defined data on relationship between selected territorial elements and units of territorial registration using the ATOM technology. The service is publicly available and free-of-charge (data covers the whole territory of the Czech Republic) and enables downloading of predefined data file containing data for the whole Czech Republic. Files are created during the first day of each month with data valid to the last day of previous month. The whole dataset (7 files) is compressed (ZIP) for downloading.

  15. Twitter Tweets Sentiment Dataset

    • kaggle.com
    • opendatabay.com
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

    Description:

    Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

    Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

    Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

    You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

    Columns:

    1. textID - unique ID for each piece of text
    2. text - the text of the tweet
    3. sentiment - the general sentiment of the tweet

    Acknowledgement:

    The dataset is download from Kaggle Competetions:
    https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification models to predict the twitter sentiments.
    • Compare the evaluation metrics of vaious classification algorithms.
  16. Z

    Coughs: ESC-50 and FSDKaggle2018

    • data.niaid.nih.gov
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alper Bozkurt (2021). Coughs: ESC-50 and FSDKaggle2018 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5136591
    Explore at:
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    Edgar Lobaton
    Michelle Hernandez
    Jinyi Qiu
    Alper Bozkurt
    Mahmoud Abdelkhalek
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset consists of timestamps for coughs contained in files extracted from the ESC-50 and FSDKaggle2018 datasets.

    Citation

    This dataset was generated and used in our paper:

    Mahmoud Abdelkhalek, Jinyi Qiu, Michelle Hernandez, Alper Bozkurt, Edgar Lobaton, “Investigating the Relationship between Cough Detection and Sampling Frequency for Wearable Devices,” in the 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2021.

    Please cite this paper if you use the timestamps.csv file in your work.

    Generation

    The cough timestamps given in the timestamps.csv file were generated using the cough templates given in figures 3 and 4 in the paper:

    A. H. Morice, G. A. Fontana, M. G. Belvisi, S. S. Birring, K. F. Chung, P. V. Dicpinigaitis, J. A. Kastelik, L. P. McGarvey, J. A. Smith, M. Tatar, J. Widdicombe, "ERS guidelines on the assessment of cough", European Respiratory Journal 2007 29: 1256-1276; DOI: 10.1183/09031936.00101006

    More precisely, 40 files labelled as "coughing" in the ESC-50 dataset and 273 files labelled as "Cough" in the FSDKaggle2018 dataset were manually searched using Audacity for segments of audio that closely matched the aforementioned templates, both visually and auditorily. Some files did not contain any coughs at all, while other files contained several coughs. Therefore, only the files that contained at least one cough are included in the coughs directory. In total, the timestamps of 768 cough segments with lengths ranging from 0.2 seconds to 0.9 seconds were extracted.

    Description

    The audio files are presented in wav format in the coughs directory. Files named in the general format of "*-*-*-24.wav" were extracted from the ESC-50 dataset, while all other files were extracted from the FSDKaggle2018 dataset.

    The timestamps.csv file contains the timestamps for the coughs and it consists of four columns:

    file_name,cough_number,start_time,end_time

    Files in the file_name column can be found in the coughs directory. cough_number refers to the index of the cough in the corresponding file. For example, if the file X.wav contains 5 coughs, then X.wav will be repeated 5 times under the file_name column, and for each row, the cough_number will range from 1 to 5. start_time refers to the starting time of a cough segment measured in seconds, while end_time refers to the end time of a cough segment measured in seconds.

    Licensing

    The ESC-50 dataset as a whole is licensed under the Creative Commons Attribution-NonCommercial license. Individual files in the ESC-50 dataset are licensed under different Creative Commons licenses. For a list of these licenses, see LICENSE. The ESC-50 files in the cough directory are given for convenience only, and have not been modified from their original versions. To download the original files, see the ESC-50 dataset.

    The FSDKaggle2018 dataset as a whole is licensed under the Creative Commons Attribution 4.0 International license. Individual files in the FSDKaggle2018 dataset are licensed under different Creative Commons licenses. For a list of these licenses, see the License section in FSDKaggle2018. The FSDKaggle2018 files in the cough directory are given for convenience only, and have not been modified from their original versions. To download the original files, see the FSDKaggle2018 dataset.

    The timestamps.csv file is licensed under the Creative Commons Attribution-NonCommercial 4.0 International license.

  17. US Real Estate

    • zenrows.com
    csv
    Updated Jun 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZenRows (2021). US Real Estate [Dataset]. https://www.zenrows.com/datasets/us-real-estate
    Explore at:
    csv(5,8MB)Available download formats
    Dataset updated
    Jun 27, 2021
    Dataset provided by
    ZenRows S.L.
    Authors
    ZenRows
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    High-quality, free real estate dataset from all around the United States, in CSV format. Over 10.000 records relevant to Real Estate investors, agents, and data scientists. We are working on complete datasets from a wide variety of countries. Don't hesitate to contact us for more information.

  18. d

    Comma separated value (CSV) text files of navigation and elevation data...

    • catalog.data.gov
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Comma separated value (CSV) text files of navigation and elevation data collected by the U.S. Geological Survey during field activity 2016-030-FA offshore Sandwich Beach, MA in June 2016 [Dataset]. https://catalog.data.gov/dataset/comma-separated-value-csv-text-files-of-navigation-and-elevation-data-collected-by-the-u-s
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    The objectives of the survey were to provide bathymetric and sidescan sonar data for sediment transport studies and coastal change model development for ongoing studies of nearshore coastal dynamics along Sandwich Town Neck Beach, MA. Data collection equipment used for this investigation are mounted on an unmanned surface vehicle (USV) uniquely adapted from a commercially sold gas-powered kayak and termed the "jetyak". The jetyak design is the result of a collaborative effort between USGS and Woods Hole Oceanographic Institution (WHOI) scientists.

  19. LLM: 7 prompt training dataset

    • kaggle.com
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl McBride Ellis (2023). LLM: 7 prompt training dataset [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/llm-7-prompt-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carl McBride Ellis
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description
    • Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi.
      File: train_essays_RDizzl3_seven_v2.csv
      Human texts: 14247 LLM texts: 3004

      See also: a new dataset of an additional 4900 LLM generated texts: LLM: Mistral-7B Instruct texts



    • Version 3: "**The RDizzl3 Seven**"
      File: train_essays_RDizzl3_seven_v1.csv

    • "Car-free cities"

    • "Does the electoral college work?"

    • "Exploring Venus"

    • "The Face on Mars"

    • "Facial action coding system"

    • "A Cowboy Who Rode the Waves"

    • "Driverless cars"

    How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"

    • Version 2: (train_essays_7_prompts_v2.csv) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

    Namely:

    • "Car-free cities"
    • "Does the electoral college work?"
    • "Exploring Venus"
    • "The Face on Mars"
    • "Facial action coding system"
    • "Seeking multiple opinions"
    • "Phones and driving"

    This dataset is a derivative of the datasets

    as well as the original competition training dataset

    • Version 1:This dataset is composed of 13,712 human texts and 1165 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.
  20. i

    Sample Dataset for Testing

    • ieee-dataport.org
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Outman (2025). Sample Dataset for Testing [Dataset]. https://ieee-dataport.org/documents/sample-dataset-testing
    Explore at:
    Dataset updated
    Apr 28, 2025
    Authors
    Alex Outman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    10

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M - CSV files [Dataset]. http://doi.org/10.5281/zenodo.6515973
Organization logo

GitTables 1M - CSV files

Explore at:
zipAvailable download formats
Dataset updated
Jun 6, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This dataset contains >800K CSV files behind the GitTables 1M corpus.

For more information about the GitTables corpus, visit:

- our website for GitTables, or

- the main GitTables download page on Zenodo.

Search
Clear search
Close search
Google apps
Main menu