100+ datasets found
  1. Sample CSV Datasets

    • kaggle.com
    zip
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SOURAV S V (2023). Sample CSV Datasets [Dataset]. https://www.kaggle.com/datasets/souravsv/sample-csv-datasets
    Explore at:
    zip(14455964 bytes)Available download formats
    Dataset updated
    Nov 30, 2023
    Authors
    SOURAV S V
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by SOURAV S V

    Released under CC0: Public Domain

    Contents

  2. CSV file used in statistical analyses

    • data.csiro.au
    • researchdata.edu.au
    • +1more
    Updated Oct 13, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
    Explore at:
    Dataset updated
    Oct 13, 2014
    Dataset authored and provided by
    CSIROhttp://www.csiro.au/
    License

    https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

    Time period covered
    Mar 14, 2008 - Jun 9, 2009
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

  3. Images in CSV datasets

    • kaggle.com
    zip
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pascal (2024). Images in CSV datasets [Dataset]. https://www.kaggle.com/datasets/pyim59/images-in-csv-datasets
    Explore at:
    zip(347504240 bytes)Available download formats
    Dataset updated
    Oct 14, 2024
    Authors
    Pascal
    Description

    Images sous forme de fichiers CSV pour une application de méthodes de machine learning "classiques" Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim

    "mnist_big.csv"

    Reconnaissance d'images de chiffres manuscrits

    Version "mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

    Source : https://www.kaggle.com/datasets/oddrationale/mnist-in-csv

    "sign_mnist_big.csv"

    Reconnaissance d'images de gestes de la langue des signes

    Version "sign_mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

    Source : https://www.kaggle.com/datasets/datamunge/sign-language-mnist

    "zalando_small.csv"

    Reconnaissance de vêtements et chaussures (Zalando)

    Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

    "hmnist_8_8_RGB.csv"

    Reconnaissance de tumeurs de la peau (images en couleurs, trois valeurs R,G,B par pixel)

    Autres versions avec des images plus petites et/ou en niveaux de gris

    Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

    "cifar10_small.csv"

    Reconnaissance de petites images en couleurs dans 10 catégories Version en CSV du dataset CIFAR10

    Source : https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv?select=train.csv

  4. B

    Residential School Locations Dataset (CSV Format)

    • borealisdata.ca
    • search.dataone.org
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2019
    Dataset provided by
    Borealis
    Authors
    Rosa Orlandini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1863 - Jun 30, 1998
    Area covered
    Canada
    Description

    The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.

  5. h

    doc-formats-csv-1

    • huggingface.co
    Updated Nov 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasets examples (2023). doc-formats-csv-1 [Dataset]. https://huggingface.co/datasets/datasets-examples/doc-formats-csv-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2023
    Dataset authored and provided by
    Datasets examples
    Description

    [doc] formats - csv - 1

    This dataset contains one csv file at the root:

    data.csv

    kind,sound dog,woof cat,meow pokemon,pika human,hello

    The YAML section of the README does not contain anything related to loading the data (only the size category metadata):

    size_categories:

    - n<1K

  6. train csv file

    • kaggle.com
    zip
    Updated May 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanuel Arias (2018). train csv file [Dataset]. https://www.kaggle.com/datasets/eamanu/train
    Explore at:
    zip(33695 bytes)Available download formats
    Dataset updated
    May 5, 2018
    Authors
    Emmanuel Arias
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Emmanuel Arias

    Released under Database: Open Database, Contents: Database Contents

    Contents

  7. SQUAD 2.0 - csv format

    • kaggle.com
    zip
    Updated Apr 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parth Chokhra (2020). SQUAD 2.0 - csv format [Dataset]. https://www.kaggle.com/datasets/parthplc/squad-20-csv-file
    Explore at:
    zip(9887206 bytes)Available download formats
    Dataset updated
    Apr 15, 2020
    Authors
    Parth Chokhra
    Description

    Dataset

    This dataset was created by Parth Chokhra

    Contents

  8. c

    Amazon India products dataset in CSV format

    • crawlfeeds.com
    csv, zip
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Amazon India products dataset in CSV format [Dataset]. https://crawlfeeds.com/datasets/amazon-india-products-dataset-in-csv-format
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    India
    Description

    Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.

    Dataset Features:

    • Product Details: Name, Brand, Category, and Unique ID

    • Pricing Information: Current Price, Discounted Price, and Currency

    • Availability & Ratings: Stock Status, Customer Ratings, and Reviews

    • Seller Information: Seller Name and Fulfillment Details

    • Additional Attributes: Product Description, Specifications, and Images

    Dataset Specifications:

    • Format: CSV

    • Number of Records: 50,000+

    • Delivery Time: 3 Days

    • Price: $149.00

    • Availability: Immediate

    This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.

  9. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  10. D

    Walmart data in CSV format

    • dataandsons.com
    csv, zip
    Updated Aug 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    crawl feeds (2022). Walmart data in CSV format [Dataset]. https://www.dataandsons.com/categories/product-lists/walmart-data-in-csv-format
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Aug 15, 2022
    Dataset provided by
    Data & Sons
    Authors
    crawl feeds
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Aug 15, 2022
    Description

    About this Dataset

    Walmart data in CSV format extracted by crawl feeds team using in-house tools. Last extracted on 15 Aug 2022.

    Category

    Product Lists

    Keywords

    Walmart dataset,retail datasets,ecommerce datasets

    Row Count

    10

    Price

    Free

  11. c

    Google Play Store Android Apps Dataset in CSV Format

    • crawlfeeds.com
    csv, zip
    Updated Nov 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). Google Play Store Android Apps Dataset in CSV Format [Dataset]. https://crawlfeeds.com/datasets/google-play-store-apps-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Nov 9, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock valuable insights with the Google Play Store Android Apps Dataset in CSV format, featuring detailed information on over thousands of Android apps available on the Google Play Store. This comprehensive dataset includes key attributes such as App Name, App Logo, Category, Description, Average Rating, Ratings Count, In-app Purchases, Operating System, Company, Content Rating, Images, Email, Additional Information, and more.

    Perfect for market researchers, data scientists, app developers, and analysts, this dataset allows for deep analysis of app performance, user preferences, and industry trends. With data on app descriptions, content ratings, in-app purchases, and company information, you can track trends in the mobile app market, evaluate user satisfaction, and conduct competitive analysis.

    The dataset is ideal for businesses looking to optimize app strategies, enhance user experience, and improve app performance based on real user feedback. Easily import the data into your favorite analysis tools to gain actionable insights for your app development or research.

    With regularly updated data scraped directly from the Google Play Store, the Google Play Store Android Apps Dataset is an invaluable resource for anyone looking to explore trends, track performance, or enhance their app strategies.

  12. Test Data Dummy CSV

    • figshare.com
    txt
    Updated Nov 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tori Duckworth (2023). Test Data Dummy CSV [Dataset]. http://doi.org/10.6084/m9.figshare.24500965.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 6, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Tori Duckworth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This CSV represents a dummy dataset to test the functionality of trusted repository search capabilities and of research data governance practices. The associated dummy dissertation is entitled Financial Econometrics Dummy Dissertation. The dummy file is a 7KB CSV containing 5000 rows of notional demographic tabular data.

  13. Waitrose Products Information Dataset in CSV Format - Comprehensive Product...

    • crawlfeeds.com
    csv, zip
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Waitrose Products Information Dataset in CSV Format - Comprehensive Product Data [Dataset]. https://crawlfeeds.com/datasets/waitrose-products-information-dataset-in-csv-format-comprehensive-product-data
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 7, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    The Waitrose Product Dataset offers a comprehensive and structured collection of grocery items listed on the Waitrose online platform. This dataset includes 25,000+ product records across multiple categories, curated specifically for use in retail analytics, pricing comparison, AI training, and eCommerce integration.

    Each record contains detailed attributes such as:

    • Product title, brand, MPN, and product ID

    • Price and currency

    • Availability status

    • Description, ingredients, and raw nutrition data

    • Review count and average rating

    • Breadcrumbs, image links, and more

    Delivered in CSV format (ZIP archive), this dataset is ideal for professionals in the FMCG, retail, and grocery tech industries who need structured, crawl-ready data for their projects.

  14. Users dataset in csv format.

    • kaggle.com
    zip
    Updated Apr 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VictorDaniloCastanedaPinzon (2022). Users dataset in csv format. [Dataset]. https://www.kaggle.com/datasets/vicdancastpinz/users-dataset-in-csv-format
    Explore at:
    zip(5347 bytes)Available download formats
    Dataset updated
    Apr 24, 2022
    Authors
    VictorDaniloCastanedaPinzon
    Description

    Dataset

    This dataset was created by VictorDaniloCastanedaPinzon

    Contents

  15. yelp-csv

    • kaggle.com
    Updated Jan 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flyer Steve (2023). yelp-csv [Dataset]. https://www.kaggle.com/datasets/flyersteve/yelp-csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2023
    Dataset provided by
    Kaggle
    Authors
    Flyer Steve
    Description

    Dataset

    This dataset was created by Flyer Steve

    Contents

  16. Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

    • zenodo.org
    application/gzip, bin +2
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
    Explore at:
    bin, application/gzip, zip, text/x-pythonAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description
    Replication pack, FSE2018 submission #164:
    ------------------------------------------
    
    **Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
    A Case Study of the PyPI Ecosystem
    
    **Note:** link to data artifacts is already included in the paper. 
    Link to the code will be included in the Camera Ready version as well.
    
    
    Content description
    ===================
    
    - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
     described below
    - **settings.py** - settings template for the code archive.
    - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
     This dataset only includes stats aggregated by the ecosystem (PyPI)
    - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
     statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
     themselves, which take around 2TB.
    - **build_model.r, helpers.r** - R files to process the survival data 
      (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
      `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
      **dataset_full_Jan_2018.tgz**)
    - **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
    - LICENSE - text of GPL v3, under which this dataset is published
    - INSTALL.md - replication guide (~2 pages)
    Replication guide
    =================
    
    Step 0 - prerequisites
    ----------------------
    
    - Unix-compatible OS (Linux or OS X)
    - Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
    - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)
    
    Depending on detalization level (see Step 2 for more details):
    - up to 2Tb of disk space (see Step 2 detalization levels)
    - at least 16Gb of RAM (64 preferable)
    - few hours to few month of processing time
    
    Step 1 - software
    ----------------
    
    - unpack **ghd-0.1.0.zip**, or clone from gitlab:
    
       git clone https://gitlab.com/user2589/ghd.git
       git checkout 0.1.0
     
     `cd` into the extracted folder. 
     All commands below assume it as a current directory.
      
    - copy `settings.py` into the extracted folder. Edit the file:
      * set `DATASET_PATH` to some newly created folder path
      * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
    - install docker. For Ubuntu Linux, the command is 
      `sudo apt-get install docker-compose`
    - install libarchive and headers: `sudo apt-get install libarchive-dev`
    - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
     Without this dependency, you might get an error on the next step, 
     but it's safe to ignore.
    - install Python libraries: `pip install --user -r requirements.txt` . 
    - disable all APIs except GitHub (Bitbucket and Gitlab support were
     not yet implemented when this study was in progress): edit
     `scraper/init.py`, comment out everything except GitHub support
     in `PROVIDERS`.
    
    Step 2 - obtaining the dataset
    -----------------------------
    
    The ultimate goal of this step is to get output of the Python function 
    `common.utils.survival_data()` and save it into a CSV file:
    
      # copy and paste into a Python console
      from common import utils
      survival_data = utils.survival_data('pypi', '2008', smoothing=6)
      survival_data.to_csv('survival_data.csv')
    
    Since full replication will take several months, here are some ways to speedup
    the process:
    
    ####Option 2.a, difficulty level: easiest
    
    Just use the precomputed data. Step 1 is not necessary under this scenario.
    
    - extract **dataset_minimal_Jan_2018.zip**
    - get `survival_data.csv`, go to the next step
    
    ####Option 2.b, difficulty level: easy
    
    Use precomputed longitudinal feature values to build the final table.
    The whole process will take 15..30 minutes.
    
    - create a folder `
  17. POCI CSV dataset of all the citation data

    • figshare.com
    zip
    Updated Dec 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations ​ (2022). POCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.21776351.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 27, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    OpenCitations ​
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains all the citation data (in CSV format) included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

    [field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the PMID of the citing entity; [field "cited"] the PMID of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

    This version of the dataset contains:

    717,654,703 citations; 26,024,862 bibliographic resources.

    The size of the zipped archive is 9.6 GB, while the size of the unzipped CSV file is 50 GB. Additional information about POCI at official webpage.

  18. c

    BevMo Alcoholic Beverage Records Extracted - Download Comprehensive CSV...

    • crawlfeeds.com
    csv, zip
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). BevMo Alcoholic Beverage Records Extracted - Download Comprehensive CSV Dataset Now [Dataset]. https://crawlfeeds.com/datasets/bevmo-alcoholic-beverage-records-extracted-download-comprehensive-csv-dataset-now
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Sep 7, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    We are excited to announce that we have successfully extracted a comprehensive set of alcoholic beverage records from BevMo and compiled them into a CSV file.

    This meticulously organized dataset includes key information such as product URLs, IDs, names, SKUs, GTIN14 barcodes, detailed product descriptions, availability status, pricing, currency, images, breadcrumbs, and more.

    Our dataset provides an invaluable resource for anyone looking to analyze or utilize detailed BevMo product information.

    Download the dataset today and gain access to a wealth of information from one of the leading beverage retailers.

    Perfect for market analysis, e-commerce insights, and competitive research.

  19. f

    Example of a csv file exported from the database.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 24, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caselle, Jennifer E.; Iles, Alison; Tinker, Martin T.; Black, August; Novak, Mark; Carr, Mark H.; Malone, Dan; Beas-Luna, Rodrigo; Hoban, Michael (2014). Example of a csv file exported from the database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001227183
    Explore at:
    Dataset updated
    Oct 24, 2014
    Authors
    Caselle, Jennifer E.; Iles, Alison; Tinker, Martin T.; Black, August; Novak, Mark; Carr, Mark H.; Malone, Dan; Beas-Luna, Rodrigo; Hoban, Michael
    Description

    Example of a csv file exported from the database.

  20. m

    Network traffic for machine learning classification

    • data.mendeley.com
    Updated Feb 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Labayen Guembe (2020). Network traffic for machine learning classification [Dataset]. http://doi.org/10.17632/5pmnkshffm.1
    Explore at:
    Dataset updated
    Feb 12, 2020
    Authors
    Víctor Labayen Guembe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.

    Activities:

    Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.

    The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.

    The amount of data is stated as follows:

    Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SOURAV S V (2023). Sample CSV Datasets [Dataset]. https://www.kaggle.com/datasets/souravsv/sample-csv-datasets
Organization logo

Sample CSV Datasets

Explore at:
zip(14455964 bytes)Available download formats
Dataset updated
Nov 30, 2023
Authors
SOURAV S V
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset

This dataset was created by SOURAV S V

Released under CC0: Public Domain

Contents

Search
Clear search
Close search
Google apps
Main menu