100+ datasets found

CSV file used in statistical analyses
data.csiro.au
researchdata.edu.au
+1more
Updated Oct 13, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
Explore at:
Unique identifier
https://doi.org/10.4225/08/543B4B4CA92E6
Dataset updated
Oct 13, 2014
Dataset authored and provided by
CSIROhttp://www.csiro.au/
License
https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
Time period covered
Mar 14, 2008 - Jun 9, 2009
Dataset funded by
CSIROhttp://www.csiro.au/
Description
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.
all csv files used for analysis of NCBI data
figshare.com
txt
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cassandre Pyne (2023). all csv files used for analysis of NCBI data [Dataset]. http://doi.org/10.6084/m9.figshare.24461239.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24461239.v1
Dataset updated
Oct 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Cassandre Pyne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
all csv files used for analysis of NCBIall files with "WOAH" in it are the disease and disease agents from WOAH's list (see manuscript for link) all breed files (with breed names in name) are from web scrapingMASTER_DATA_coordinates_FINAL_AUG_5: cleaned mined data from NCBI
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
f
.csv files for statistical analysis in MetaboAnalyst
figshare.com
txt
Updated Sep 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Trzeciecka (2018). .csv files for statistical analysis in MetaboAnalyst [Dataset]. http://doi.org/10.6084/m9.figshare.7078253.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7078253.v2
Dataset updated
Sep 17, 2018
Dataset provided by
figshare
Authors
Anna Trzeciecka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are .csv files for statistical analysis in MetaboAnalyst.
1000 Empirical Time series
figshare.com
researchdata.edu.au
png
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Fulcher (2023). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5436136.v10
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ben Fulcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.
A
‘Titanic: all ones csv file’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Titanic: all ones csv file’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-all-ones-csv-file-b42d/619a9807/
Explore at:
Dataset updated
Feb 18, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Titanic: all ones csv file’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/brendan45774/gender-submisson on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

The score of the csv file is 0.37799. This is the number to beat, so make sure you don't have a number below this.

Content

This is the titanic csv file, but everyone survives.

I also have another csv file: https://www.kaggle.com/brendan45774/test-file This may help you on your mission to get a perfect score.

--- Original source retains full ownership of the source dataset ---
Python Codes for Data Analysis of The Impact of COVID-19 on Technical...
figshare.com
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Szkirpan (2022). Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results [Dataset]. http://doi.org/10.6084/m9.figshare.20416092.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.20416092.v1
Dataset updated
Aug 1, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Elizabeth Szkirpan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).
CSV Dataset for the “Development of Monitoring Techniques for Endangered...
catalog.data.gov
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Fish and Wildlife Service (2025). CSV Dataset for the “Development of Monitoring Techniques for Endangered Spring Endemic Invertebrates: an Assessment of Abundance.” This CSV file is used in the analysis for the manuscript that addresses density and statistical power for three Bitter Lake NWR invertebrates. https://doi.org/10.7944/W3T66W [Dataset]. https://catalog.data.gov/dataset/csv-dataset-for-the-development-of-monitoring-techniques-for-endangered-spring-endemic-inv
Explore at:
Dataset updated
Feb 22, 2025
Dataset provided by
U.S. Fish and Wildlife Servicehttp://www.fws.gov/
Description
CSV Data set. The Data Dictionary (Part 1) and Statistical Code (Part 2) are in ServCat Reference....
c
Mecca Australia Extracted Data in CSV Format
crawlfeeds.com
csv, zip
Updated Sep 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2024). Mecca Australia Extracted Data in CSV Format [Dataset]. https://crawlfeeds.com/datasets/mecca-australia-extracted-data-in-csv-format
Explore at:
csv, zipAvailable download formats
Dataset updated
Sep 2, 2024
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
format. This dataset provides comprehensive details on a wide range of beauty products listed on Mecca Australia, one of the leading beauty retailers in the country.

Perfect for market researchers, data analysts, and beauty industry professionals, this dataset enables a deep dive into product offerings and trends without the clutter of customer reviews.

Features:

Product Information: Detailed data on various beauty products, including product names, categories, and brands.

Pricing Data: Up-to-date pricing details for each product, allowing for competitive analysis and pricing strategy development.

Product Descriptions: Comprehensive descriptions that provide insight into product features and benefits.

Stock Availability: Information on stock status to help track product availability and manage inventory.

CSV Format: Easy-to-use CSV file format for seamless integration into any data analysis or business intelligence tool.

Applications:

Market Analysis: Gain insights into the beauty market trends in Australia by analyzing product categories, brands, and pricing.

Competitor Research: Compare product offerings and pricing strategies to understand the competitive landscape.

Inventory Management: Use stock availability data to optimize inventory and ensure popular items are always in stock.

Product Development: Leverage product descriptions to identify gaps in the market and innovate new product offerings.

With the "Mecca Australia Extracted Data" in CSV format, you can easily access and analyze crucial product data, enabling informed decision-making and strategic planning in the beauty industry.
A
‘E-Designations: CSV file’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘E-Designations: CSV file’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-e-designations-csv-file-9ee2/f71a6d90/?iid=011-360&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘E-Designations: CSV file’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/13ba2f51-b7cd-48fc-86d4-273a0ae3502c on 13 February 2022.

--- Dataset description provided by original source is as follows ---

This data set contains information and locations of (E) Designations, including CEQR Environment Requirements (Table 1) and CEQR Restrictive Declarations (Table 2), in Appendix C of the Zoning Resolution. An (E) Designation provides notice of the presence of an environmental requirement pertaining to potential hazardous materials contamination, high ambient noise levels or air emission concerns on a particular tax lot.

All previously released versions of this data are available at BYTES of the BIG APPLE- Archive

--- Original source retains full ownership of the source dataset ---
f
Data from: Supplemental data
figshare.com
xlsx
Updated Mar 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
T Miyakoshi; Yoichi M. Ito (2024). Supplemental data [Dataset]. http://doi.org/10.6084/m9.figshare.24596058.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24596058.v1
Dataset updated
Mar 15, 2024
Dataset provided by
figshare
Authors
T Miyakoshi; Yoichi M. Ito
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset for the article "The current utilization status of wearable devices in clinical research".Analyses were performed by utilizing the JMP Pro 16.10, Microsoft Excel for Mac version 16 (Microsoft).The file extension "jrp" is a file of the statistical analysis software JMP, which contains both the analysis code and the data set.In case JMP is not available, a "csv" file as a data set and JMP script, the analysis code, are prepared in "rtf" format.The "xlsx" file is a Microsoft Excel file that contains the data set and the data plotted or tabulated using Microsoft Excel functions.Supplementary Figure 1. NCT number duplication frequencyIncludes Excel file used to create the figure (Supplemental Figure 1).・Sfig1_NCT number duplication frequency.xlsxSupplementary Figure 2-5　Simple and annual time series aggregationIncludes Excel file, JMP repo file, csv dataset of JMP repo file and JMP scripts used to create the figure (Supplementary Figures 2-5).・Sfig2-5 Annual time series aggregation.xlsx・Sfig2 Study Type.jrp・Sfig4device type.jrp・Sfig3 Interventions Type.jrp・Sfig5Conditions type.jrp・Sfig2, 3 ,5_database.csv・Sfig2_JMP script_Study type.rtf・Sfig3_JMP script Interventions type.rtf・Sfig5_JMP script Conditions type.rtf・Sfig4_dataset.csv・Sfig4_JMP script_device type.rtfSupplementary Figures 6-11 Mosaic diagram of intervention by conditionSupplementary tables 4-9 Analysis of contingency table for intervention by condition JMP repot files used to create the figures(Supplementary Figures 6-11 ) and tables(Supplementary Tablea 4-9) , including the csv dataset of JMP repot files and JMP scripts.・Sfig6-11 Stable4-9 Intervention devicetype_conditions.jrp・Sfig6-11_Stable4-9_dataset.csv・Sfig6-11_Stable4-9_JMP script.rtfSupplementary Figure 12. Distribution of enrollmentIncludes Excel file, JMP repo file, csv dataset of JMP repo file and JMP scripts used to create the figure (Supplementary Figures 12).・Sfig12_Distribution of enrollment.jrp・Sfig12_Distribution of enrollment.csv・Sfig12_JMP script.rtf
The Canada Trademarks Dataset
zenodo.org
pdf, zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Sheff; Jeremy Sheff (2024). The Canada Trademarks Dataset [Dataset]. http://doi.org/10.5281/zenodo.4999655
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4999655
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jeremy Sheff; Jeremy Sheff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
The Canada Trademarks Dataset

18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303

Dataset Selection and Arrangement (c) 2021 Jeremy Sheff

Python and Stata Scripts (c) 2021 Jeremy Sheff

Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.

This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.

Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.

Terms of Use:

As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.

The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:

The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.

Details of Repository Contents:

This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:

/csv: contains the .csv versions of the data files

/do: contains Stata do-files used to convert the .csv files to .dta format and perform the statistical analyses set forth in the paper reporting this dataset

/dta: contains the .dta versions of the data files

/py: contains the python scripts used to download CIPO’s historical trademarks data via SFTP and generate the .csv data files

If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.

The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.

With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.

The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.

This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
m
csv datasets and summary statistics - Dataset - DCOR
dcor.mpl.mpg.de
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). csv datasets and summary statistics - Dataset - DCOR [Dataset]. https://dcor.mpl.mpg.de/dataset/csv-datasets-and-summary-statistics
Explore at:
Dataset updated
Jun 6, 2025
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Contains csv data of cell features used for the analysis in the publication: "A novel MYH9 variant leads to atypical Epstein-Fechtner syndrome by altering non-muscle myosin IIA mediated contractile processes". These csv files contain call relevant cell features per patient and cell type. Files should be titled: For controls: + + .csv For patients: + + + + .csv Metadata containing sex and age is also available in files: “controls_metadata.csv” and “patients_metadata.csv” Summary statistic is also included in this public dataset. For controls: “controls_summary_statistics.csv” For patients: “patients_summary_statistics.csv” Summary statistic files are created using publicly available code: code: https://github.com/SaraKaliman/dc-data-novel-MYH9-variant/blob/main/Step1_summary_statistics.ipynb Group analysis included t-test, U-test and effect size for t-test and can be found in the file: “summary_statistical_group_analysis.csv” file. Main figure in the article and statistical analysis are done using publicly available code: https://github.com/SaraKaliman/dc-data-novel-MYH9-variant/blob/main/Step2_group_comparison.ipynb Single scalar rtdc files is included only due to limitation of DCOR datasets to rtdc files.
Z
Brussel mobility Twitter sentiment analysis CSV Dataset
data.niaid.nih.gov
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
van Vessem, Charlotte (2024). Brussel mobility Twitter sentiment analysis CSV Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11401123
Explore at:
Dataset updated
May 31, 2024
Dataset provided by
van Vessem, Charlotte
Tori, Floriano
Ginis, Vincent
Betancur Arenas, Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brussels
Description
SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.
UCI and OpenML Data Sets for Ordinal Quantification
zenodo.org
data.niaid.nih.gov
zip
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8177302
Dataset updated
Jul 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

Usage

You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

Data Extraction: In your terminal, you can call either

make

(recommended), or

julia --project="." --eval "using Pkg; Pkg.instantiate()" julia --project="." extract-oq.jl

Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

Further Reading

Implementation of our experiments: https://github.com/mirkobunse/regularized-oq
m
Ransomware and user samples for training and validating ML models
data.mendeley.com
Updated Sep 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo Berrueta (2021). Ransomware and user samples for training and validating ML models [Dataset]. http://doi.org/10.17632/yhg5wk39kf.2
Explore at:
Unique identifier
https://doi.org/10.17632/yhg5wk39kf.2
Dataset updated
Sep 17, 2021
Authors
Eduardo Berrueta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ransomware is considered as a significant threat for most enterprises since past few years. In scenarios wherein users can access all files on a shared server, one infected host is capable of locking the access to all shared files. In the article related to this repository, we detect ransomware infection based on file-sharing traffic analysis, even in the case of encrypted traffic. We compare three machine learning models and choose the best for validation. We train and test the detection model using more than 70 ransomware binaries from 26 different families and more than 2500 h of ‘not infected’ traffic from real users. The results reveal that the proposed tool can detect all ransomware binaries, including those not used in the training phase (zero-days). This paper provides a validation of the algorithm by studying the false positive rate and the amount of information from user files that the ransomware could encrypt before being detected.

This dataset directory contains the 'infected' and 'not infected' samples and the models used for each T configuration, each one in a separated folder.

The folders are named NxSy where x is the number of 1-second interval per sample and y the sliding step in seconds.

Each folder (for example N10S10/) contains: - tree.py -> Python script with the Tree model. - ensemble.json -> JSON file with the information about the Ensemble model. - NN_XhiddenLayer.json -> JSON file with the information about the NN model with X hidden layers (1, 2 or 3). - N10S10.csv -> All samples used for training each model in this folder. It is in csv format for using in bigML application. - zeroDays.csv -> All zero-day samples used for testing each model in this folder. It is in csv format for using in bigML application. - userSamples_test -> All samples used for validating each model in this folder. It is in csv format for using in bigML application. - userSamples_train -> User samples used for training the models. - ransomware_train -> Ransomware samples used for training the models - scaler.scaler -> Standard Scaler from python library used for scale the samples. - zeroDays_notFiltered -> Folder with the zeroDay samples.

In the case of N30S30 folder, there is an additional folder (SMBv2SMBv3NFS) with the samples extracted from the SMBv2, SMBv3 and NFS traffic traces. There are more binaries than the ones presented in the article, but it is because some of them are not "unseen" binaries (the families are present in the training set).

The files containing samples (NxSy.csv, zeroDays.csv and userSamples_test.csv) are structured as follows: - Each line is one sample. - Each sample has 3*T features and the label (1 if it is 'infected' sample and 0 if it is not). - The features are separated by ',' because it is a csv file. - The last column is the label of the sample.

Additionally we have placed two pcap files in root directory. There are the traces used for compare both versions of SMB.
CSV files of CHMI and healthy control trials
data.csiro.au
researchdata.edu.au
Updated Apr 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosalind Wang (2019). CSV files of CHMI and healthy control trials [Dataset]. http://doi.org/10.25919/5ca167585f0aa
Explore at:
Unique identifier
https://doi.org/10.25919/5ca167585f0aa
Dataset updated
Apr 1, 2019
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Rosalind Wang
License
https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
Time period covered
Aug 1, 2016 - Jun 1, 2017
Dataset funded by
CSIROhttp://www.csiro.au/
Description
This is the CSV files converted from XML file for 1. the CHMI trials of plasmodium falciparum at https://doi.org/10.25919/5b5b7530a39f4 2. the CHMI trials of plasmodium vivax at https://doi.org/10.25919/5b5a6bf69aca5 3. the healthy control trials at https://doi.org/10.25919/5b5e699817220

The data within each trial is organised in day of analysis, for each normalisation of the data. Lineage: The code for converting from XML to CSV is at https://github.com/rosalind-wang/GCPeakDetection
c
Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format
crawlfeeds.com
csv, zip
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format [Dataset]. https://crawlfeeds.com/datasets/dog-food-data-extracted-from-chewy-usa-4-500-records-in-csv-format
Explore at:
zip, csvAvailable download formats
Dataset updated
Apr 22, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Dog Food Data Extracted from Chewy (USA) dataset contains 4,500 detailed records of dog food products sourced from one of the leading pet supply platforms in the United States, Chewy. This dataset is ideal for businesses, researchers, and data analysts who want to explore and analyze the dog food market, including product offerings, pricing strategies, brand diversity, and customer preferences within the USA.

The dataset includes essential information such as product names, brands, prices, ingredient details, product descriptions, weight options, and availability. Organized in a CSV format for easy integration into analytics tools, this dataset provides valuable insights for those looking to study the pet food market, develop marketing strategies, or train machine learning models.

Key Features:

Record Count: 4,500 dog food product records.

Data Fields: Product names, brands, prices, descriptions, ingredients .. etc. Find more fields under data points section.

Format: CSV, easy to import into databases and data analysis tools.

Source: Extracted from Chewy’s official USA platform.

Geography: Focused on the USA dog food market.

Use Cases:

Market Research: Analyze trends and preferences in the USA dog food market, including popular brands, price ranges, and product availability.

E-commerce Analysis: Understand how Chewy presents and prices dog food products, helping businesses compare their own product offerings.

Competitor Analysis: Compare different brands and products to develop competitive strategies for dog food businesses.

Machine Learning Models: Use the dataset for machine learning tasks such as product recommendation systems, demand forecasting, and price optimization.
A/B Trial Aggregated Data
kaggle.com
Updated Sep 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergei Logvinov (2022). A/B Trial Aggregated Data [Dataset]. https://www.kaggle.com/datasets/sergylog/ab-trial-aggregated-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sergei Logvinov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The csv file contains aggregated data on the results of the experiment (user_id), treatment type (group) and key user metrics(views and clicks) The task is to analyze the results of the experiment and write your recommendations.
f
Datasets
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bastian Eichenberger; YinXiu Zhan (2023). Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.12958037.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12958037.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Bastian Eichenberger; YinXiu Zhan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The benchmarking datasets used for deepBlink. The npz files contain train/valid/test splits inside and can be used directly. The files belong to the following challenges / classes:- ISBI Particle tracking challenge: microtubule, vesicle, receptor- Custom synthetic (based on http://smal.ws): particle- Custom fixed cell: smfish- Custom live cell: suntagThe csv files are to determine which image in the test splits correspond to which original image, SNR, and density.

Facebook

Twitter

Click to copy link

Link copied

Cite

CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6

CSV file used in statistical analyses

Explore at:

Unique identifier

https://doi.org/10.4225/08/543B4B4CA92E6

Dataset updated

Oct 13, 2014

Dataset authored and provided by

CSIROhttp://www.csiro.au/

License

https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

Time period covered

Mar 14, 2008 - Jun 9, 2009

Dataset funded by

CSIROhttp://www.csiro.au/

Description

A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

Clear search

Close search

Google apps

Main menu

CSV file used in statistical analyses

all csv files used for analysis of NCBI data

Datasets for Sentiment Analysis

.csv files for statistical analysis in MetaboAnalyst

1000 Empirical Time series

‘Titanic: all ones csv file’ analyzed by Analyst-2

Context

Content

Python Codes for Data Analysis of The Impact of COVID-19 on Technical...

CSV Dataset for the “Development of Monitoring Techniques for Endangered...

Mecca Australia Extracted Data in CSV Format

Features:

Applications:

‘E-Designations: CSV file’ analyzed by Analyst-2

Data from: Supplemental data

The Canada Trademarks Dataset

csv datasets and summary statistics - Dataset - DCOR

Brussel mobility Twitter sentiment analysis CSV Dataset

UCI and OpenML Data Sets for Ordinal Quantification

Ransomware and user samples for training and validating ML models

CSV files of CHMI and healthy control trials

Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format

Use Cases:

A/B Trial Aggregated Data

Datasets

CSV file used in statistical analysesSee More Versions

CSV file used in statistical analyses