100+ datasets found
  1. CSV file used in statistical analyses

    • data.csiro.au
    • researchdata.edu.au
    • +1more
    Updated Oct 13, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
    Explore at:
    Dataset updated
    Oct 13, 2014
    Dataset authored and provided by
    CSIROhttp://www.csiro.au/
    License

    https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

    Time period covered
    Mar 14, 2008 - Jun 9, 2009
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

  2. GitTables 1M - CSV files

    • zenodo.org
    zip
    Updated Jun 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M - CSV files [Dataset]. http://doi.org/10.5281/zenodo.6515973
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains >800K CSV files behind the GitTables 1M corpus.

    For more information about the GitTables corpus, visit:

    - our website for GitTables, or

    - the main GitTables download page on Zenodo.

  3. Training examples.csv

    • kaggle.com
    Updated Mar 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Владимир Терентьев (2021). Training examples.csv [Dataset]. https://www.kaggle.com/datasets/terentevvs/training-examplescsv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Владимир Терентьев
    Description

    Dataset

    This dataset was created by Владимир Терентьев

    Contents

  4. q

    Dataset in CSV format

    • data.researchdatafinder.qut.edu.au
    Updated Jun 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Dataset in CSV format [Dataset]. https://data.researchdatafinder.qut.edu.au/dataset/occupational-head-dose/resource/3502742e-7a1e-4995-8da3-1f40f8362920
    Explore at:
    Dataset updated
    Jun 18, 2019
    License

    http://researchdatafinder.qut.edu.au/display/n6106http://researchdatafinder.qut.edu.au/display/n6106

    Description

    QUT Research Data Respository Dataset Resource available for download

  5. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  6. m

    Download CSV DB

    • maclookup.app
    json
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Download CSV DB [Dataset]. https://maclookup.app/downloads/csv-database
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 8, 2025
    Description

    Free, daily updated MAC prefix and vendor CSV database. Download now for accurate device identification.

  7. Sample Graph Datasets in CSV Format

    • zenodo.org
    csv
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edwin Carreño; Edwin Carreño (2024). Sample Graph Datasets in CSV Format [Dataset]. http://doi.org/10.5281/zenodo.14335015
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Edwin Carreño; Edwin Carreño
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample Graph Datasets in CSV Format

    Note: none of the data sets published here contain actual data, they are for testing purposes only.

    Description

    This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:

    • dataset_30_nodes_interactions.csv:contains 30 rows (nodes).
    • dataset_30_edges_interactions.csv: contains 47 rows (edges).
    • the common identifier dataset_30 refers to the same graph.

    CSV nodes

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    UniProt IDstringprotein identification
    labelstringprotein label (type of node)
    propertiesstringa dictionary containing properties related to the protein.

    CSV edges

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    Relationship IDstringrelationship identification
    Source IDstringidentification of the source protein in the relationship
    Target IDstringidentification of the target protein in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata

    GraphNumber of NodesNumber of EdgesSparse graph

    dataset_30*

    30

    47

    Y

    dataset_60*

    60

    181

    Y

    dataset_120*

    120

    689

    Y

    dataset_240*

    240

    2819

    Y

    dataset_300*

    300

    4658

    Y

    dataset_600*

    600

    18004

    Y

    dataset_1200*

    1200

    71785

    Y

    dataset_2400*

    2400

    288600

    Y

    dataset_3000*

    3000

    449727

    Y

    dataset_6000*

    6000

    1799413

    Y

    dataset_12000*

    12000

    7199863

    Y

    dataset_24000*

    24000

    28792361

    Y

    dataset_30000*

    30000

    44991744

    Y

    This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.

    CSV nodes (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringnode identification
    labelstringnode label (type of node)
    propertiesstringa dictionary containing properties related to the node.

    CSV edges (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringrelationship identification
    sourcestringidentification of the source node in the relationship
    targetstringidentification of the target node in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata (tiny graphs)

    GraphNumber of NodesNumber of EdgesSparse graph
    dataset_dummy*36N
    dataset_dummy2*36N
  8. BevMo Alcoholic Beverage Records Extracted - Download Comprehensive CSV...

    • crawlfeeds.com
    csv, zip
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). BevMo Alcoholic Beverage Records Extracted - Download Comprehensive CSV Dataset Now [Dataset]. https://crawlfeeds.com/datasets/bevmo-alcoholic-beverage-records-extracted-download-comprehensive-csv-dataset-now
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Sep 7, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    We are excited to announce that we have successfully extracted a comprehensive set of alcoholic beverage records from BevMo and compiled them into a CSV file.

    This meticulously organized dataset includes key information such as product URLs, IDs, names, SKUs, GTIN14 barcodes, detailed product descriptions, availability status, pricing, currency, images, breadcrumbs, and more.

    Our dataset provides an invaluable resource for anyone looking to analyze or utilize detailed BevMo product information.

    Download the dataset today and gain access to a wealth of information from one of the leading beverage retailers.

    Perfect for market analysis, e-commerce insights, and competitive research.

  9. EPA FRS Facilities Combined File CSV Download for the State of Arkansas

    • catalog.data.gov
    Updated Nov 29, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Environmental Information (OEI) - Office of Information Collection (OIC) (2020). EPA FRS Facilities Combined File CSV Download for the State of Arkansas [Dataset]. https://catalog.data.gov/dataset/epa-frs-facilities-combined-file-csv-download-for-the-state-of-arkansas
    Explore at:
    Dataset updated
    Nov 29, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Arkansas
    Description

    The Facility Registry System (FRS) identifies facilities, sites, or places subject to environmental regulation or of environmental interest to EPA programs or delegated states. Using vigorous verification and data management procedures, FRS integrates facility data from program national systems, state master facility records, tribal partners, and other federal agencies and provides the Agency with a centrally managed, single source of comprehensive and authoritative information on facilities.

  10. Amazon India products dataset in CSV format

    • crawlfeeds.com
    csv, zip
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Amazon India products dataset in CSV format [Dataset]. https://crawlfeeds.com/datasets/amazon-india-products-dataset-in-csv-format
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    India
    Description

    Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.

    Dataset Features:

    • Product Details: Name, Brand, Category, and Unique ID

    • Pricing Information: Current Price, Discounted Price, and Currency

    • Availability & Ratings: Stock Status, Customer Ratings, and Reviews

    • Seller Information: Seller Name and Fulfillment Details

    • Additional Attributes: Product Description, Specifications, and Images

    Dataset Specifications:

    • Format: CSV

    • Number of Records: 50,000+

    • Delivery Time: 3 Days

    • Price: $149.00

    • Availability: Immediate

    This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.

  11. Gene expression csv files

    • figshare.com
    txt
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristina Alvira (2023). Gene expression csv files [Dataset]. http://doi.org/10.6084/m9.figshare.21861975.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Cristina Alvira
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Csv files containing all detectable genes.

  12. B

    Residential School Locations Dataset (CSV Format)

    • borealisdata.ca
    • search.dataone.org
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2019
    Dataset provided by
    Borealis
    Authors
    Rosa Orlandini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1863 - Jun 30, 1998
    Area covered
    Canada
    Description

    The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.

  13. car_price dataset

    • kaggle.com
    Updated May 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ngawang Choeda (2021). car_price dataset [Dataset]. https://www.kaggle.com/datasets/ngawangchoeda/car-price-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ngawang Choeda
    Description

    The car_price.csv file contains a dataset of various car-models.

    The dataset contains 205 rows and 26 columns(features) of which 25 are independent features. Below shows a detailed information of feature names with its labels and datatypes.

    It is a regression problem where with the various features we are expected to predict the price of a car.

    The dataset doesn't contain any null values.

    Independent features:

    Features Labels Datatype

    symboling 6 int64 fueltype 2 object aspiration. 2 object doornumber. 2 object carbody 5 object drivewheel 3 object enginelocation 2 object wheelbase 53 float64 carlength 75 float64 carwidth 44 float64 carheight 49 float64 curbweight 171 int64 enginetype 7 object cylindernumber 7 object enginesize 44 int64 fuelsystem 8 object boreratio 38 float64 stroke 37 float64 compressionratio 32 float64 horsepower 59 int64 peakrpm 23 int64 citympg 29 int64 highwaympg 30 int64

    **Target/Dependent variable: ** For the dataset we have price as our dependent feature with its datatype float64, hence using Regression Models we are expected to predict the value of price

    Features Labels Datatype

    price 189 float64

  14. H

    Dataset metadata of known Dataverse installations, August 2023

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Gautier (2024). Dataset metadata of known Dataverse installations, August 2023 [Dataset]. http://doi.org/10.7910/DVN/8FEGUV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Julian Gautier
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the metadata of the datasets published in 85 Dataverse installations and information about each installation's metadata blocks. It also includes the lists of pre-defined licenses or terms of use that dataset depositors can apply to the datasets they publish in the 58 installations that were running versions of the Dataverse software that include that feature. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations and improving understandings about how certain Dataverse features and metadata fields are used. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 22 and August 28, 2023 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another column named "apikey" listing my accounts' API tokens. The Python script expects the CSV file and the listed API tokens to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation)_2023.08.22-2023.08.28.csv │ ├── contributor(citation)_2023.08.22-2023.08.28.csv │ ├── data_source(citation)_2023.08.22-2023.08.28.csv │ ├── ... │ └── topic_classification(citation)_2023.08.22-2023.08.28.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2023.08.27_12.59.59.zip │ ├── dataset_pids_Abacus_2023.08.27_12.59.59.csv │ ├── Dataverse_JSON_metadata_2023.08.27_12.59.59 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2023.08.26_22.14.04.zip │ ├── ADA_Dataverse_2023.08.27_13.16.20.zip │ ├── Arca_Dados_2023.08.27_13.34.09.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2023.08.27_19.24.15.zip └── dataverse_installations_summary_2023.08.28.csv └── dataset_pids_from_most_known_dataverse_installations_2023.08.csv └── license_options_for_each_dataverse_installation_2023.09.05.csv └── metadatablocks_from_most_known_dataverse_installations_2023.09.05.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the citation metadata block and geospatial metadata block of datasets in the 85 Dataverse installations. For example, author(citation)_2023.08.22-2023.08.28.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in the 85 installations, where there's a row for author names, affiliations, identifier types and identifiers. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 85 zipped files, one for each of the 85 Dataverse installations whose dataset metadata I was able to download. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. It also includes the alias/identifier and category of the Dataverse collection that the dataset is in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The Dataverse JSON export of the latest version of each dataset includes "(latest_version)" in the file name. This should help those who are interested in the metadata of only the latest version of each dataset. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I included them so that they can be used when extracting metadata from the dataset's Dataverse JSON exports. The dataverse_installations_summary_2023.08.28.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata...

  15. The Canada Trademarks Dataset

    • zenodo.org
    pdf, zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy Sheff; Jeremy Sheff (2024). The Canada Trademarks Dataset [Dataset]. http://doi.org/10.5281/zenodo.4999655
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jeremy Sheff; Jeremy Sheff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Canada
    Description

    The Canada Trademarks Dataset

    18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303

    Dataset Selection and Arrangement (c) 2021 Jeremy Sheff

    Python and Stata Scripts (c) 2021 Jeremy Sheff

    Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.

    This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.

    Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.

    Terms of Use:

    As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.

    The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:

    The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.

    Details of Repository Contents:

    This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:

    • /csv: contains the .csv versions of the data files
    • /do: contains Stata do-files used to convert the .csv files to .dta format and perform the statistical analyses set forth in the paper reporting this dataset
    • /dta: contains the .dta versions of the data files
    • /py: contains the python scripts used to download CIPO’s historical trademarks data via SFTP and generate the .csv data files

    If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.

    The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.

    With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.

    The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.

    This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.

  16. a

    Alaska DCCED CBPL Active Business License CSV File Download

    • rural-utility-business-advisory-hub-site-1-dcced.hub.arcgis.com
    • egrants-hub-dcced.hub.arcgis.com
    • +1more
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dept. of Commerce, Community, & Economic Development (2021). Alaska DCCED CBPL Active Business License CSV File Download [Dataset]. https://rural-utility-business-advisory-hub-site-1-dcced.hub.arcgis.com/documents/6070036058764b96a0d37d147088e70c
    Explore at:
    Dataset updated
    Nov 16, 2021
    Dataset authored and provided by
    Dept. of Commerce, Community, & Economic Development
    Area covered
    Alaska
    Description

    Alaska DCCED Division of Corporations, Business and Professional Licensing courtesy CSV Download Link Location

  17. Datasets

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bastian Eichenberger; YinXiu Zhan (2023). Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.12958037.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Bastian Eichenberger; YinXiu Zhan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The benchmarking datasets used for deepBlink. The npz files contain train/valid/test splits inside and can be used directly. The files belong to the following challenges / classes:- ISBI Particle tracking challenge: microtubule, vesicle, receptor- Custom synthetic (based on http://smal.ws): particle- Custom fixed cell: smfish- Custom live cell: suntagThe csv files are to determine which image in the test splits correspond to which original image, SNR, and density.

  18. Geolocation Data [Longitude Latitude]

    • kaggle.com
    Updated Mar 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    You Sheng (2022). Geolocation Data [Longitude Latitude] [Dataset]. https://www.kaggle.com/datasets/liewyousheng/geolocation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    You Sheng
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Context

    Full Database of city state country available in CSV format. All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.

    Each CSV has the 1. Longitude 2. Latitude

    of each location, alongside other miscellaneous country data such as 3. Currency 4. State code 5. Phone country code

    Content

    Total Countries : 250 Total States/Regions/Municipalities : 4,963 Total Cities/Towns/Districts : 148,061

    Last Updated On : 29th January 2022

    Source

    https://github.com/dr5hn/countries-states-cities-database

  19. m

    Network traffic and code for machine learning classification

    • data.mendeley.com
    Updated Feb 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Labayen (2020). Network traffic and code for machine learning classification [Dataset]. http://doi.org/10.17632/5pmnkshffm.2
    Explore at:
    Dataset updated
    Feb 20, 2020
    Authors
    Víctor Labayen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.

    Activities:

    Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.

    The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.

    The amount of data is stated as follows:

    Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes

    The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.

  20. Walmart Dataset

    • crawlfeeds.com
    csv, zip
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Walmart Dataset [Dataset]. https://crawlfeeds.com/datasets/walmart-dataset
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Apr 26, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Walmart products sample dataset having 1000+ records in CSV format. Download monthly dataset for walmart data and it having around 100K+ records.

    Get 50% discount for all datasets. Link

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
Organization logo

CSV file used in statistical analyses

Explore at:
Dataset updated
Oct 13, 2014
Dataset authored and provided by
CSIROhttp://www.csiro.au/
License

https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

Time period covered
Mar 14, 2008 - Jun 9, 2009
Dataset funded by
CSIROhttp://www.csiro.au/
Description

A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

Search
Clear search
Close search
Google apps
Main menu