100+ datasets found
  1. Small Business Data

    • kaggle.com
    zip
    Updated Mar 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anne Ezeh (2024). Small Business Data [Dataset]. https://www.kaggle.com/datasets/anneezeh/small-business-data
    Explore at:
    zip(8544 bytes)Available download formats
    Dataset updated
    Mar 11, 2024
    Authors
    Anne Ezeh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Anne Ezeh

    Released under Apache 2.0

    Contents

  2. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  3. ETT-small

    • kaggle.com
    zip
    Updated Oct 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alaa ELmor (2022). ETT-small [Dataset]. https://www.kaggle.com/datasets/alaaelmor/ettsmall
    Explore at:
    zip(4083932 bytes)Available download formats
    Dataset updated
    Oct 22, 2022
    Authors
    Alaa ELmor
    Description

    Dataset

    This dataset was created by Alaa ELmor

    Contents

  4. CSV file used in statistical analyses

    • data.csiro.au
    • researchdata.edu.au
    • +1more
    Updated Oct 13, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
    Explore at:
    Dataset updated
    Oct 13, 2014
    Dataset authored and provided by
    CSIROhttp://www.csiro.au/
    License

    https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

    Time period covered
    Mar 14, 2008 - Jun 9, 2009
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

  5. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Nepal, Tunisia, Canada, British Indian Ocean Territory, Andorra, Bangladesh, Isle of Man, Northern Mariana Islands, Moldova (Republic of), Taiwan
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  6. Small_customer_data_csv

    • kaggle.com
    zip
    Updated Sep 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush11111111 (2023). Small_customer_data_csv [Dataset]. https://www.kaggle.com/datasets/ayush11111111/small-customer-data-csv
    Explore at:
    zip(746 bytes)Available download formats
    Dataset updated
    Sep 27, 2023
    Authors
    Ayush11111111
    Description

    Dataset

    This dataset was created by Ayush11111111

    Contents

  7. Datasets used in Transitive prediction of small-molecule function through...

    • figshare.com
    csv
    Updated May 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Feng Bao (2025). Datasets used in Transitive prediction of small-molecule function through alignment of high-content screening resources [Dataset]. http://doi.org/10.6084/m9.figshare.29061038.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 14, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Feng Bao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports the development of CLIPn, a contrastive-learning framework designed to align heterogeneous high-content screening (HCS) profile datasets.GitHub link: https://github.com/AltschulerWu-Lab/CLIPnDirectory StructureFoldersraw_profilesHCS13/Contains raw data from 13 high-content screening (HCS) datasets. Each dataset includes meta and feature files.L1000/CDRP_feature_exp.csv: Raw L1000 expression data from the CDRP dataset.CDRP_meta_exp.csv: Metadata associated with the CDRP expression data.LINCS_feature_exp.csv: Raw L1000 expression data from the LINCS dataset.LINCS_meta_exp.csv: Metadata associated with the LINCS expression data.RxRx3/RxRx3_feature_final.csv: Profile data from the RxRx3 dataset.RxRx3_meta_final.csv: Metadata from the RxRx3 dataset.Uncharacterized_compounds/NCI_cpnData.csv: Feature data for uncharacterized compounds from the NCI dataset.NCI_cpnInfo.csv: Information about uncharacterized compounds in the NCI dataset.Prestwick_UTSW_cpnData.csv: Feature data for uncharacterized compounds from the Prestwick UTSW dataset.Prestwick_UTSW_cpnInfo.csv: Information about uncharacterized compounds from the Prestwick UTSW dataset.Data ReferenceFor raw datasets from 13 HCS database, data and analysis pipeline for dataset 1 was obtained from https://www.science.org/doi/suppl/10.1126/science.1100709/suppl_file/perlman.som.zip; for datasets 2-3, data were shared by authors; For datasets 4-5, analysis code was downloaded from https://static-content.springer.com/esm/art:10.1038/nbt.3419/MediaObjects/41587_2016_BFnbt3419_MOESM21_ESM.zip and data were shared by authors; For datasets 6-7, processed dataset was downloaded from AWS following instructions from https://github.com/carpenter-singh-lab/2022_Haghighi_NatureMethods, and replicate_level_cp_normalized.csv.gz features were used. For project datasets 8-13, datasets and analysis results were downloaded from https://zenodo.org/records/7352487. For RxRx3, dataset was obtained from https://www.rxrx.ai/rxrx3. L1000 transcript datasets were downloaded using the same link as datasets 6-7 and the processed transcript data files (named “replicate_level_l1k.csv”) were used.

  8. f

    TP and NTP small dataset

    • datasetcatalog.nlm.nih.gov
    • figshare.arts.ac.uk
    Updated Mar 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Velios, Athanasios (2022). TP and NTP small dataset [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000439502
    Explore at:
    Dataset updated
    Mar 31, 2022
    Authors
    Velios, Athanasios
    Description

    This dataset includes statements about manuscripts from the library of St. Catherine Monastery in Sinai and specifically about the existence of leaf markers on each manuscript. The dataset is provided in three formats: CSV, OWL and RDF.Leaf markers are not individually identified. Only their existence and type is indicated. The dataset is used to demonstrate a method of describing numerous individuals and absence of types in Knowledge Bases. two-records.csv is part of the original data as collected at the Monastery.two-records-owlcro.owl holds part of the original data alongside fictional records of individual leaf markers for each book (these do not exist but they are necessary to demonstrate the applicable method)two-records-owlcrop.owl holds part of the original data onlyThe same logic is followed for the RDF files.The size of this dataset allows performing test reasoning in OWL. A full dataset is also available in this repository.

  9. Annotated Benchmark of Real-World Data for Approximate Functional Dependency...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren (2023). Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery [Dataset]. http://doi.org/10.5281/zenodo.8098909
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery

    This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.

    The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.

    The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.

    Dataset References

  10. Code4ML 2.0

    • zenodo.org
    csv, txt
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonimous authors; Anonimous authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

    The original dataset is organized into multiple CSV files, each containing structured data on different entities:

    • code_blocks.csv: Contains raw code snippets extracted from Kaggle.
    • kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
    • competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
    • markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
    • vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

    Table 1. code_blocks.csv structure

    ColumnDescription
    code_blocks_indexGlobal index linking code blocks to markup_data.csv.
    kernel_idIdentifier for the Kaggle Jupyter notebook from which the code block was extracted.
    code_block_id

    Position of the code block within the notebook.

    code_block

    The actual machine learning code snippet.

    Table 2. kernels_meta.csv structure

    ColumnDescription
    kernel_idIdentifier for the Kaggle Jupyter notebook.
    kaggle_scorePerformance metric of the notebook.
    kaggle_commentsNumber of comments on the notebook.
    kaggle_upvotesNumber of upvotes the notebook received.
    kernel_linkURL to the notebook.
    comp_nameName of the associated Kaggle competition.

    Table 3. competitions_meta.csv structure

    ColumnDescription
    comp_nameName of the Kaggle competition.
    descriptionOverview of the competition task.
    data_typeType of data used in the competition.
    comp_typeClassification of the competition.
    subtitleShort description of the task.
    EvaluationAlgorithmAbbreviationMetric used for assessing competition submissions.
    data_sourcesLinks to datasets used.
    metric typeClass label for the assessment metric.

    Table 4. markup_data.csv structure

    ColumnDescription
    code_blockMachine learning code block.
    too_longFlag indicating whether the block spans multiple semantic types.
    marksConfidence level of the annotation.
    graph_vertex_idID of the semantic type.

    The dataset allows mapping between these tables. For example:

    • code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
    • kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

    In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

    Code4ML 2.0 Enhancements

    The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

    Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

    competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

    Applications

    The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

    • Code generation
    • Code understanding
    • Natural language processing of code-related tasks
  11. h

    short-jokes-punchline

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timxjl (2024). short-jokes-punchline [Dataset]. https://huggingface.co/datasets/Timxjl/short-jokes-punchline
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Authors
    Timxjl
    License

    https://choosealicense.com/licenses/gpl-2.0/https://choosealicense.com/licenses/gpl-2.0/

    Description

    Short Jokes Punchline

    This dataset contains information about jokes, visitors, labels, and label segments used in a joke labeling application. The data is stored in four CSV files: joke.csv, visitor.csv, label.csv, and label_segment.csv.

      Files
    
    
    
    
    
      joke.csv
    

    This file contains 200 jokes randomly sampled from the Kaggle dataset "Short Jokes." Each row represents a joke with the following columns:

    id: The unique identifier for the joke. text: The text content of the… See the full description on the dataset page: https://huggingface.co/datasets/Timxjl/short-jokes-punchline.

  12. d

    Data from: CSV file of names, times, and locations of images collected by an...

    • catalog.data.gov
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). CSV file of names, times, and locations of images collected by an unmanned aerial system (UAS) flying over Black Beach, Falmouth, Massachusetts on 18 March 2016 [Dataset]. https://catalog.data.gov/dataset/csv-file-of-names-times-and-locations-of-images-collected-by-an-unmanned-aerial-system-uas
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Massachusetts, Black Beach, Falmouth
    Description

    Imagery acquired with unmanned aerial systems (UAS) and coupled with structure from motion (SfM) photogrammetry can produce high-resolution topographic and visual reflectance datasets that rival or exceed lidar and orthoimagery. These new techniques are particularly useful for data collection of coastal systems, which requires high temporal and spatial resolution datasets. The U.S. Geological Survey worked in collaboration with members of the Marine Biological Laboratory and Woods Hole Analytics at Black Beach, in Falmouth, Massachusetts to explore scientific research demands on UAS technology for topographic and habitat mapping applications. This project explored the application of consumer-grade UAS platforms as a cost-effective alternative to lidar and aerial/satellite imagery to support coastal studies requiring high-resolution elevation or remote sensing data. A small UAS was used to capture low-altitude photographs and GPS devices were used to survey reference points. These data were processed in an SfM workflow to create an elevation point cloud, an orthomosaic image, and a digital elevation model.

  13. h

    my-dataset

    • huggingface.co
    Updated May 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    aman (2024). my-dataset [Dataset]. https://huggingface.co/datasets/ns-1/my-dataset
    Explore at:
    Dataset updated
    May 16, 2024
    Authors
    aman
    Description

    This directory includes a few sample datasets to get you started.

    california_housing_data*.csv is California housing data from the 1990 US Census; more information is available at: https://docs.google.com/document/d/e/2PACX-1vRhYtsvc5eOR2FWNCwaBiKL6suIOrxJig8LcSBbmCbyYsayia_DvPOOBlXZ4CAlQ5nlDD8kTaIDRwrN/pub

    mnist_*.csv is a small sample of the MNIST database, which is described at: http://yann.lecun.com/exdb/mnist/

    anscombe.json contains a copy of Anscombe's quartet; it was originally… See the full description on the dataset page: https://huggingface.co/datasets/ns-1/my-dataset.

  14. Popular Quotes from GoodReads

    • kaggle.com
    zip
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Souradip Pal (2023). Popular Quotes from GoodReads [Dataset]. https://www.kaggle.com/datasets/souradippal/popular-quotes-from-goodreads
    Explore at:
    zip(191669 bytes)Available download formats
    Dataset updated
    Jul 1, 2023
    Authors
    Souradip Pal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The DataSet is in csv format. Contains author name, quote, and popularity. Scraped from GoodReads. Drawbacks: I was only able to scrape 3k data.

  15. SynSpeech Dataset (Small Version)

    • figshare.com
    csv
    Updated Nov 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yusuf Brima (2024). SynSpeech Dataset (Small Version) [Dataset]. http://doi.org/10.6084/m9.figshare.27627840.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yusuf Brima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SynSpeech Dataset (Small Version) is an English-language synthetic speech dataset created using OpenVoice and LibriSpeech-100 for bench-marking disentangled speech representation learning methods. It includes 50 unique speakers, each with 500 distinct sentences spoken in a “default” style at a 16kHz sampling rate. Data is organized by speaker ID, with a synspeech_Small_Metadata.csv file detailing speaker information, gender, speaking style, text, and file paths. This dataset is ideal for tasks in representation learning, speaker and content factorization, and TTS synthesis.

  16. o

    CLEAN_SmallRingTensileTest_StainlessSteel316L_ExtensionRate1_15mmMin

    • ordo.open.ac.uk
    txt
    Updated Mar 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aniket Joshi; Alexander Forsey; Richard Moat; Salih Gungor (2023). CLEAN_SmallRingTensileTest_StainlessSteel316L_ExtensionRate1_15mmMin [Dataset]. http://doi.org/10.21954/ou.rd.22114536.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    The Open University
    Authors
    Aniket Joshi; Alexander Forsey; Richard Moat; Salih Gungor
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This is a part of the test dataset (Digital Image Correlation Images and CSV data) that describes the experiments done at The Open University for the Small Ring Tensile Testing of SS316L performed at various displacement rates. Unprocessed images (.NEF format) start with the prefix 'RAW' while processed images (.TIF format) start with the prefix 'CLEAN'. The following letters after that describe the test type (Small Ring Test or Uniaxial Test), followed by the material. Lastly, after the material, the crosshead extension rate is described. For instance, 'Extension Rate0_3mmMin' refers to an extension rate of 0.3 mm/min and so on. The 'RAW' folders also contain the unprocessed CSV files. The 'CLEAN' folders contain the camera information (capture interval, ISO, etc) as well as the denoised CSV experimental files. The CSV files are denoised with the help of a Butterworth filter.

  17. Podcast PR Contacts - Self-Service CSV Batch Export

    • datarade.ai
    .csv, .xls
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Listen Notes (2025). Podcast PR Contacts - Self-Service CSV Batch Export [Dataset]. https://datarade.ai/data-products/podcast-pr-contacts-self-service-csv-batch-export-listen-notes
    Explore at:
    .csv, .xlsAvailable download formats
    Dataset updated
    May 27, 2025
    Dataset authored and provided by
    Listen Notes
    Area covered
    Bulgaria, Algeria, Kuwait, Costa Rica, Dominican Republic, French Polynesia, Israel, Gibraltar, Congo, Benin
    Description

    == Quick starts ==

    Batch export podcast metadata to CSV files:

    1) Export by search keyword: https://www.listennotes.com/podcast-datasets/keyword/

    2) Export by category: https://www.listennotes.com/podcast-datasets/category/

    == Quick facts ==

    The most up-to-date and comprehensive podcast database available All languages & All countries Includes over 3,500,000 podcasts Features 35+ data fields , such as basic metadata, global rank, RSS feed (with audio URLs), Spotify links, and more Delivered in CSV format

    == Data Attributes ==

    See the full list of data attributes on this page: https://www.listennotes.com/podcast-datasets/fields/?filter=podcast_only

    How to access podcast audio files: Our dataset includes RSS feed URLs for all podcasts. You can retrieve audio for over 170 million episodes directly from these feeds. With access to the raw audio, you’ll have high-quality podcast speech data ideal for AI training and related applications.

    == Custom Offers ==

    We can provide custom datasets based on your needs, such as language-specific data, daily/weekly/monthly update frequency, or one-time purchases.

    We also provide a RESTful API at PodcastAPI.com

    Contact us: hello@listennotes.com

    == Need Help? ==

    If you have any questions about our products, feel free to reach out hello@listennotes.com

    == About Listen Notes, Inc. ==

    Since 2017, Listen Notes, Inc. has provided the leading podcast search engine and podcast database.

  18. m

    Phishing Websites Dataset

    • data.mendeley.com
    Updated Sep 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grega Vrbančič (2020). Phishing Websites Dataset [Dataset]. http://doi.org/10.17632/72ptz43s9v.1
    Explore at:
    Dataset updated
    Sep 24, 2020
    Authors
    Grega Vrbančič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.

    In this repository the two variants of the Phishing Dataset are presented.

    Full variant - dataset_full.csv Short description of the full variant dataset: Total number of instances: 88,647 Number of legitimate website instances (labeled as 0): 58,000 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111

    Small variant - dataset_small.csv Short description of the small variant dataset: Total number of instances: 58,645 Number of legitimate website instances (labeled as 0): 27,998 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111

  19. Small Business Financial Dataset (2022–2023)

    • kaggle.com
    zip
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabrielle Charlton (2025). Small Business Financial Dataset (2022–2023) [Dataset]. https://www.kaggle.com/datasets/gabriellecharlton/coffee-shop-financial-dataset-synthetic-2022-2023
    Explore at:
    zip(22299 bytes)Available download formats
    Dataset updated
    Sep 2, 2025
    Authors
    Gabrielle Charlton
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📊 Coffee Shop Financial Dataset (Synthetic, 2022–2023)

    📝 Overview

    This dataset simulates the financial records of a small-town coffee shop over a two-year period (Jan 2022 – Dec 2023).
    It was designed for data science, bookkeeping, and analytics projects — including financial dashboards, revenue forecasting, and expense tracking.

    The dataset contains 5 CSV files representing different business accounts:
    1. checking_account_main.csv - Daily sales deposits (hot drinks, cold drinks, pastries, sandwiches) + operating expenses
    2. checking_account_secondary.csv - Monthly transfers between accounts + payroll funding
    3. credit_card_account.csv - Weekly credit card expenses (supplies, utilities, vendor charges) and payments
    4. gusto_payroll.csv - Payroll data for 3 employees + 1 contractor
    5. gusto_payroll_bc.csv - Payroll data for 3 full-time employees + 1 contractor + 1 seasonal employee, with actual tax breakdown for the province of British Columbia, Canada

    📂 File Details

    checking_account_main.csv

    • date
    • description
    • category (Sales, Utilities, Rent, Supplies, etc.)
    • amount (positive = inflow, negative = outflow)
    • balance

    checking_account_secondary.csv

    • date
    • description
    • amount
    • balance

    credit_card_account.csv

    • date
    • vendor
    • category (Supplies, Marketing, Utilities, etc.)
    • amount (negative = charge, positive = payment)
    • balance

    gusto_payroll.csv

    • date
    • employee_id
    • employee_name (Owner, Barista 1, Barista 2, Contractor)
    • role (Owner, Barista, Manager, Contractor)
    • gross_pay

    gusto_payroll_bc.csv

    This file simulates bi-weekly payroll data for a small coffee shop in British Columbia, Canada, covering January 2022 – December 2023.
    It reflects realistic Canadian payroll structure with federal and provincial tax breakdowns, CPP, EI, and additional factors.

    Columns: - date → Pay date (bi-weekly schedule)
    - employee_id → Unique identifier for each employee
    - employee_name → Owner, Barista 1, Barista 2, Manager, Contractor, plus a seasonal Barista (June–Aug 2022)
    - role → Role within the coffee shop (Owner, Barista, Manager, Contractor)
    - gross_pay → Total earnings before deductions (wages + tips + reimbursements)
    - federal_tax → Federal income tax withheld
    - provincial_tax → British Columbia income tax withheld
    - cpp_employee → Employee CPP contribution
    - ei_employee → Employee EI contribution
    - other_deductions → Placeholder for possible deductions (e.g., garnishments, union dues)
    - net_pay → Take-home pay after deductions
    - tips → Declared tips (taxable, included in gross pay)
    - travel_reimbursement → Non-taxable reimbursement for travel expenses (if applicable)
    - cpp_employer → Employer portion of CPP contributions
    - ei_employer → Employer portion of EI contributions

    Notes: - Payroll data is synthetic but modeled on Canadian payroll rules (2022–2023 rates).
    - A seasonal barista employee is included (employed June 1 – Aug 31, 2022).
    - Travel reimbursements are non-taxable and recorded separately.
    - This file allows users to practice payroll accounting, deductions analysis, and tax reconciliation.

    📈 Business Context

    • The coffee shop experiences higher sales September–February (holiday season & winter drinks).
    • Sales dip March–June due to seasonality in a small town.
    • Pastries are sourced from a local bakery, while sandwiches are made in-house.
    • Payroll includes 3 employees (baristas, manager) and 1 independent contractor.

    🎯 Possible Use Cases

    • Build a financial health dashboard
    • Forecast revenue and expenses
    • Create a profit & loss statement
    • Test SQL queries for accounting workflows
    • Explore data visualization with Python, R, or BI tools
    • Educational projects for small business analytics

    📜 License

    This dataset is released under the MIT License, free to use for research, learning, or commercial purposes.

    ⭐ If you use this dataset in your project or notebook, please credit and share your work, it helps the community!

    📷 Photo Credits: freepik

  20. Tacit Knowledge management Dataset.csv

    • figshare.com
    txt
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philip Adu Sarfo (2023). Tacit Knowledge management Dataset.csv [Dataset]. http://doi.org/10.6084/m9.figshare.23702121.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 18, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Philip Adu Sarfo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a study aimed to investigate the impact of tacit knowledge management systems (TKM) on organizational performance (OP) among Ghanaian small and medium enterprises (SMEs), addressing the function of employee performance (EP) and Job Satisfaction (JS) building on the knowledge-based viewpoint (KBV).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anne Ezeh (2024). Small Business Data [Dataset]. https://www.kaggle.com/datasets/anneezeh/small-business-data
Organization logo

Small Business Data

Explore at:
zip(8544 bytes)Available download formats
Dataset updated
Mar 11, 2024
Authors
Anne Ezeh
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset

This dataset was created by Anne Ezeh

Released under Apache 2.0

Contents

Search
Clear search
Close search
Google apps
Main menu