4 datasets found
  1. h

    test-csv-conversion

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous User, test-csv-conversion [Dataset]. https://huggingface.co/datasets/anonymous-user-546/test-csv-conversion
    Explore at:
    Authors
    Anonymous User
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    license: cc-by-nc-nd-4.0

  2. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  3. Z

    Data and Code for High Throughput FTIR Analysis of Macro and Microplastics...

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Chamas (2023). Data and Code for High Throughput FTIR Analysis of Macro and Microplastics with Plate Readers [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7772571
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Ali Chamas
    Win Cowger
    Sebastian Primpke
    Lisa Roscher
    Gunnar Gerdts
    Lukas Gehrke
    Benjamin D Maurer
    Hannah Jebens
    Description

    Data and source code for reproducing the analysis conducted in "High Throughput FTIR Analysis of Macro and Microplastics with Plate Readers" All materials are licensed for noncommercial purposes https://creativecommons.org/licenses/by-nc/4.0/ Explanatory_Videos.zip has videos showing data collection methods. HIDA_Publication.R has source code for doing data cleanup and analysis on data in database.zip. databasedata.zip holds all raw and analyzed data. - ATR, Reflectance, and Transmission folders has all data used in the manuscript. In a raw (.0) and combined (export.csv) format for each of the plates analyzed (folder numbers). - Plots folder has images of each spectrum. - cell_information.csv has the raw ids and comments made at the time the particles were assessed. - classes_reference_2.csv has the transformations used to standardize open specy's terms to polymer classes. - CleanedSpectra_raw.csv has the total cleaned up database of all spectral intensities in long format.
    - joined_cell_metadata.csv has the metadata for each plate well analyzed. - library_metadata.csv has metadata for each spectrum in raw form for each particle id. - Lisa_Plate_6.csv has the metadata from Lisa Roscher used in this study. - Metadata_raw.csv has the conformed metadata that can be paired with the CleanedSpectra_raw.csv file. - OpenSpecy_Classification_Baseline.csv has the particle metadata combined with Open Specy's classes identified after baseline correcting and smoothing the spectra with the standard Open Specy routine. - OpenSpecy_Classification_Raw.csv has the particle metadata combined with Open Specy's identified classes if using the raw spectra. - particle_spectrum_match.csv converts particle ids to their reference in the Polymer_Material_Database_AWI_V2_Win.xlsx file. - Polymer_Material_Database_AWI_V2_Win.xlsx metadata on materials from Primpke's database. - polymer_metadata_2.csv can be used to crosswalk polymer categories to more or less specific terminology. - spread_os.csv is the reference database used in CleanedSpectra_raw.csv that has been spread to wide format. - Top Correlation Data20221201-125621.csv is a download of results from Open Specy's beta tool that provides the top ids from the reference database.

  4. Deep learning four decades of human migration: datasets

    • zenodo.org
    csv, nc
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Gaskin; Thomas Gaskin; Guy Abel; Guy Abel (2025). Deep learning four decades of human migration: datasets [Dataset]. http://doi.org/10.5281/zenodo.15778301
    Explore at:
    nc, csvAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Thomas Gaskin; Thomas Gaskin; Guy Abel; Guy Abel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Zenodo repository contains all migration flow estimates associated with the paper "Deep learning four decades of human migration." Evaluation code, training data, trained neural networks, and smaller flow datasets are available in the main GitHub repository, which also provides detailed instructions on data sourcing. Due to file size limits, the larger datasets are archived here.

    Data is available in both NetCDF (.nc) and CSV (.csv) formats. The NetCDF format is more compact and pre-indexed, making it suitable for large files. In Python, datasets can be opened as xarray.Dataset objects, enabling coordinate-based data selection.

    Each dataset uses the following coordinate conventions:

    • Year: 1990–2023
    • Birth ISO: Country of birth (UN ISO3)
    • Origin ISO: Country of origin (UN ISO3)
    • Destination ISO: Destination country (UN ISO3)
    • Country ISO: Used for net migration data (UN ISO3)

    The following data files are provided:

    • T.nc: Full table of flows disaggregated by country of birth. Dimensions: Year, Birth ISO, Origin ISO, Destination ISO
    • flows.nc: Total origin-destination flows (equivalent to T summed over Birth ISO). Dimensions: Year, Origin ISO, Destination ISO
    • net_migration.nc: Net migration data by country. Dimensions: Year, Country ISO
    • stocks.nc: Stock estimates for each country pair. Dimensions: Year, Origin ISO (corresponding to Birth ISO), Destination ISO
    • test_flows.nc: Flow estimates on a randomly selected set of test edges, used for model validation

    Additionally, two CSV files are provided for convenience:

    • mig_unilateral.csv: Unilateral migration estimates per country, comprising:
      • imm: Total immigration flows
      • emi: Total emigration flows
      • net: Net migration
      • imm_pop: Total immigrant population (non-native-born)
      • emi_pop: Total emigrant population (living abroad)
    • mig_bilateral.csv: Bilateral flow data, comprising:
      • mig_prev: Total origin-destination flows
      • mig_brth: Total birth-destination flows, where Origin ISO reflects place of birth

    Each dataset includes a mean variable (mean estimate) and a std variable (standard deviation of the estimate).

    An ISO3 conversion table is also provided.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anonymous User, test-csv-conversion [Dataset]. https://huggingface.co/datasets/anonymous-user-546/test-csv-conversion

test-csv-conversion

anonymous-user-546/test-csv-conversion

Explore at:
Authors
Anonymous User
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

license: cc-by-nc-nd-4.0

Search
Clear search
Close search
Google apps
Main menu