100+ datasets found
  1. Sample csv data

    • kaggle.com
    Updated Oct 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Faisal Ahmed Arman (2021). Sample csv data [Dataset]. https://www.kaggle.com/datasets/mdfaisalahmedarman/sample-csv-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Md Faisal Ahmed Arman
    Description

    Dataset

    This dataset was created by Md Faisal Ahmed Arman

    Contents

  2. Sample Graph Datasets in CSV Format

    • zenodo.org
    csv
    Updated Dec 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edwin Carreño; Edwin Carreño (2024). Sample Graph Datasets in CSV Format [Dataset]. http://doi.org/10.5281/zenodo.14335015
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Edwin Carreño; Edwin Carreño
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample Graph Datasets in CSV Format

    Note: none of the data sets published here contain actual data, they are for testing purposes only.

    Description

    This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:

    • dataset_30_nodes_interactions.csv:contains 30 rows (nodes).
    • dataset_30_edges_interactions.csv: contains 47 rows (edges).
    • the common identifier dataset_30 refers to the same graph.

    CSV nodes

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    UniProt IDstringprotein identification
    labelstringprotein label (type of node)
    propertiesstringa dictionary containing properties related to the protein.

    CSV edges

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    Relationship IDstringrelationship identification
    Source IDstringidentification of the source protein in the relationship
    Target IDstringidentification of the target protein in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata

    GraphNumber of NodesNumber of EdgesSparse graph

    dataset_30*

    30

    47

    Y

    dataset_60*

    60

    181

    Y

    dataset_120*

    120

    689

    Y

    dataset_240*

    240

    2819

    Y

    dataset_300*

    300

    4658

    Y

    dataset_600*

    600

    18004

    Y

    dataset_1200*

    1200

    71785

    Y

    dataset_2400*

    2400

    288600

    Y

    dataset_3000*

    3000

    449727

    Y

    dataset_6000*

    6000

    1799413

    Y

    dataset_12000*

    12000

    7199863

    Y

    dataset_24000*

    24000

    28792361

    Y

    dataset_30000*

    30000

    44991744

    Y

    This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.

    CSV nodes (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringnode identification
    labelstringnode label (type of node)
    propertiesstringa dictionary containing properties related to the node.

    CSV edges (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringrelationship identification
    sourcestringidentification of the source node in the relationship
    targetstringidentification of the target node in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata (tiny graphs)

    GraphNumber of NodesNumber of EdgesSparse graph
    dataset_dummy*36N
    dataset_dummy2*36N
  3. CSV file used in statistical analyses

    • data.csiro.au
    • researchdata.edu.au
    • +1more
    Updated Oct 13, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
    Explore at:
    Dataset updated
    Oct 13, 2014
    Dataset authored and provided by
    CSIROhttp://www.csiro.au/
    License

    https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

    Time period covered
    Mar 14, 2008 - Jun 9, 2009
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

  4. GitTables 1M - CSV files

    • zenodo.org
    zip
    Updated Jun 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M - CSV files [Dataset]. http://doi.org/10.5281/zenodo.6515973
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains >800K CSV files behind the GitTables 1M corpus.

    For more information about the GitTables corpus, visit:

    - our website for GitTables, or

    - the main GitTables download page on Zenodo.

  5. Training examples.csv

    • kaggle.com
    Updated Mar 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Владимир Терентьев (2021). Training examples.csv [Dataset]. https://www.kaggle.com/datasets/terentevvs/training-examplescsv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Владимир Терентьев
    Description

    Dataset

    This dataset was created by Владимир Терентьев

    Contents

  6. Sample CSV files

    • kaggle.com
    Updated Mar 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naman Kumar (2022). Sample CSV files [Dataset]. https://www.kaggle.com/matcauthon49/sample-csv-files/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 9, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Naman Kumar
    Description

    Dataset

    This dataset was created by Naman Kumar

    Contents

  7. 1000 Empirical Time series

    • figshare.com
    • researchdata.edu.au
    png
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Fulcher (2023). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
    Explore at:
    pngAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ben Fulcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.

  8. h

    doc-formats-csv-3

    • huggingface.co
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasets examples (2023). doc-formats-csv-3 [Dataset]. https://huggingface.co/datasets/datasets-examples/doc-formats-csv-3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2023
    Dataset authored and provided by
    Datasets examples
    Description

    [doc] formats - csv - 3

    This dataset contains one csv file at the root:

    data.csv

    ignored comment

    col1|col2 dog|woof cat|meow pokemon|pika human|hello

    We define the config name in the YAML config, as well as the exact location of the file, the separator as "|", the name of the columns, and the number of rows to ignore (the row #1 is a row of column headers, that will be replaced by the names option, and the row #0 is ignored). The reference for the options is the documentation… See the full description on the dataset page: https://huggingface.co/datasets/datasets-examples/doc-formats-csv-3.

  9. UCI and OpenML Data Sets for Ordinal Quantification

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

    With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

    We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

    Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

    Usage

    You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

    Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

    Data Extraction: In your terminal, you can call either

    make

    (recommended), or

    julia --project="." --eval "using Pkg; Pkg.instantiate()"
    julia --project="." extract-oq.jl

    Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

    Further Reading

    Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

  10. f

    Event Logs CSV

    • figshare.com
    rar
    Updated Dec 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dina Bayomie (2019). Event Logs CSV [Dataset]. http://doi.org/10.6084/m9.figshare.11342063.v1
    Explore at:
    rarAvailable download formats
    Dataset updated
    Dec 9, 2019
    Dataset provided by
    figshare
    Authors
    Dina Bayomie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The event logs in CSV format. The dataset contains both correlated and uncorrelated logs

  11. B

    Residential School Locations Dataset (CSV Format)

    • borealisdata.ca
    • search.dataone.org
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2019
    Dataset provided by
    Borealis
    Authors
    Rosa Orlandini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1863 - Jun 30, 1998
    Area covered
    Canada
    Description

    The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.

  12. h

    doc-formats-csv-1

    • huggingface.co
    Updated Nov 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasets examples (2023). doc-formats-csv-1 [Dataset]. https://huggingface.co/datasets/datasets-examples/doc-formats-csv-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2023
    Dataset authored and provided by
    Datasets examples
    Description

    [doc] formats - csv - 1

    This dataset contains one csv file at the root:

    data.csv

    kind,sound dog,woof cat,meow pokemon,pika human,hello

    The YAML section of the README does not contain anything related to loading the data (only the size category metadata):

    size_categories:

    - n<1K

  13. w

    Randomized Hourly Load Data for use with Taxonomy Distribution Feeders

    • data.wu.ac.at
    application/unknown
    Updated Aug 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Energy (2017). Randomized Hourly Load Data for use with Taxonomy Distribution Feeders [Dataset]. https://data.wu.ac.at/schema/data_gov/NWYwYmFmYTItOWRkMC00OWM0LTk3OGYtZDcyYzZiOWY5N2Ez
    Explore at:
    application/unknownAvailable download formats
    Dataset updated
    Aug 29, 2017
    Dataset provided by
    Department of Energy
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset was developed by NREL's distributed energy systems integration group as part of a study on high penetrations of distributed solar PV [1]. It consists of hourly load data in CSV format for use with the PNNL taxonomy of distribution feeders [2]. These feeders were developed in the open source GridLAB-D modelling language [3]. In this dataset each of the load points in the taxonomy feeders is populated with hourly averaged load data from a utility in the feeder’s geographical region, scaled and randomized to emulate real load profiles. For more information on the scaling and randomization process, see [1].

    The taxonomy feeders are statistically representative of the various types of distribution feeders found in five geographical regions of the U.S. Efforts are underway (possibly complete) to translate these feeders into the OpenDSS modelling language.

    This data set consists of one large CSV file for each feeder. Within each CSV, each column represents one load bus on the feeder. The header row lists the name of the load bus. The subsequent 8760 rows represent the loads for each hour of the year. The loads were scaled and randomized using a Python script, so each load series represents only one of many possible randomizations. In the header row, "rl" = residential load and "cl" = commercial load. Commercial loads are followed by a phase letter (A, B, or C). For regions 1-3, the data is from 2009. For regions 4-5, the data is from 2000.

    For use in GridLAB-D, each column will need to be separated into its own CSV file without a header. The load value goes in the second column, and corresponding datetime values go in the first column, as shown in the sample file, sample_individual_load_file.csv. Only the first value in the time column needs to written as an absolute time; subsequent times may be written in relative format (i.e. "+1h", as in the sample). The load should be written in P+Qj format, as seen in the sample CSV, in units of Watts (W) and Volt-amps reactive (VAr). This dataset was derived from metered load data and hence includes only real power; reactive power can be generated by assuming an appropriate power factor. These loads were used with GridLAB-D version 2.2.

    Browse files in this dataset, accessible as individual files and as a single ZIP file. This dataset is approximately 242MB compressed or 475MB uncompressed.

    For questions about this dataset, contact andy.hoke@nrel.gov.

    If you find this dataset useful, please mention NREL and cite [1] in your work.

    References:

    [1] A. Hoke, R. Butler, J. Hambrick, and B. Kroposki, “Steady-State Analysis of Maximum Photovoltaic Penetration Levels on Typical Distribution Feeders,” IEEE Transactions on Sustainable Energy, April 2013, available at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6357275 .

    [2] K. Schneider, D. P. Chassin, R. Pratt, D. Engel, and S. Thompson, “Modern Grid Initiative Distribution Taxonomy Final Report”, PNNL, Nov. 2008. Accessed April 27, 2012: http://www.gridlabd.org/models/feeders/taxonomy of prototypical feeders.pdf

    [3] K. Schneider, D. Chassin, Y. Pratt, and J. C. Fuller, “Distribution power flow for smart grid technologies”, IEEE/PES Power Systems Conference and Exposition, Seattle, WA, Mar. 2009, pp. 1-7, 15-18.

  14. Raw data bulkRNAseq.csv

    • figshare.com
    txt
    Updated Mar 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Grünschläger; Dominik Vonficht (2022). Raw data bulkRNAseq.csv [Dataset]. http://doi.org/10.6084/m9.figshare.19425302.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 28, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Florian Grünschläger; Dominik Vonficht
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bulk RNA-seq data (smartseq2; raw freature counts) of naive murine CD4+ T cells co-cultured with murine HSPCs (THSPC), or with murine DCs (TDC), or murine LSKs as control condition, in the presence or absence of antigen (ova,ctrl)

  15. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  16. m

    Network traffic and code for machine learning classification

    • data.mendeley.com
    Updated Feb 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Labayen (2020). Network traffic and code for machine learning classification [Dataset]. http://doi.org/10.17632/5pmnkshffm.2
    Explore at:
    Dataset updated
    Feb 20, 2020
    Authors
    Víctor Labayen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.

    Activities:

    Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.

    The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.

    The amount of data is stated as follows:

    Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes

    The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.

  17. Test Data Dummy CSV

    • figshare.com
    txt
    Updated Nov 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tori Duckworth (2023). Test Data Dummy CSV [Dataset]. http://doi.org/10.6084/m9.figshare.24500965.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 6, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Tori Duckworth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This CSV represents a dummy dataset to test the functionality of trusted repository search capabilities and of research data governance practices. The associated dummy dissertation is entitled Financial Econometrics Dummy Dissertation. The dummy file is a 7KB CSV containing 5000 rows of notional demographic tabular data.

  18. d

    can-csv

    • data.dtu.dk
    zip
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brooke Elizabeth Lampe (2023). can-csv [Dataset]. http://doi.org/10.11583/DTU.24805509.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 15, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Brooke Elizabeth Lampe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    can-csvThis dataset contains controller area network (CAN) traffic for the 2017 Subaru Forester, the 2016 Chevrolet Silverado, the 2011 Chevrolet Traverse, and the 2011 Chevrolet Impala.For each vehicle, there are samples of attack-free traffic--that is, normal traffic--as well as samples of various types of attacks. The spoofing attacks, such as RPM spoofing, speed spoofing, etc., have an observable effect on the vehicle under test.This repository contains only .csv files. It is a subset of the can-dataset repository.

  19. Example Questionnaire Data

    • figshare.com
    txt
    Updated Sep 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharine Hubbard (2020). Example Questionnaire Data [Dataset]. http://doi.org/10.6084/m9.figshare.12928757.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 8, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Katharine Hubbard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    .csv file of climate change perception data. Note this data is made up as an example for survey analysis so should not be used for any other purposes. For R code for analysis see:For example write-up of this data see:https://figshare.com/account/projects/88601/articles/12928067

  20. Sample AIMO problems yearwise CSV

    • kaggle.com
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rithvik (2024). Sample AIMO problems yearwise CSV [Dataset]. https://www.kaggle.com/datasets/rithzdev/sample-aimo-problems-yearwise-csv/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rithvik
    Description

    Dataset

    This dataset was created by Rithvik

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Md Faisal Ahmed Arman (2021). Sample csv data [Dataset]. https://www.kaggle.com/datasets/mdfaisalahmedarman/sample-csv-data
Organization logo

Sample csv data

Explore at:
60 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Md Faisal Ahmed Arman
Description

Dataset

This dataset was created by Md Faisal Ahmed Arman

Contents

Search
Clear search
Close search
Google apps
Main menu