100+ datasets found
  1. h

    Fake_or_Real_Competition_Dataset

    • huggingface.co
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GenON (2023). Fake_or_Real_Competition_Dataset [Dataset]. https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2023
    Dataset authored and provided by
    GenON
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    2023 Fake or Real: AI-generated Image Discrimination Competition dataset is now available on Hugging Face!

    Hello🖐️ We are excited to announce the release of the dataset for the 2023 Fake or Real: AI-generated Image Discrimination Competition. The competition was held on AI CONNECT(https://aiconnect.kr/) from June 26th to July 6th, 2023, with 768 participants.If you're interested in evaluating the performance of your model on the test dataset, we encourage you to visit the… See the full description on the dataset page: https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset.

  2. 🏙 Dubai Real Estate Sales Insights | UAE 🇦🇪 🏠

    • kaggle.com
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azhar Saleem (2024). 🏙 Dubai Real Estate Sales Insights | UAE 🇦🇪 🏠 [Dataset]. https://www.kaggle.com/datasets/azharsaleem/dubai-real-estate-sales-insights
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2024
    Dataset provided by
    Kaggle
    Authors
    Azhar Saleem
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Dubai, United Arab Emirates
    Description

    Dubai, UAE Real Estate Market Dataset

    👨‍💻 Author: Azhar Saleem

    "https://github.com/azharsaleem18" target="_blank"> https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github" alt="GitHub Profile"> "https://www.kaggle.com/azharsaleem" target="_blank"> https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle" alt="Kaggle Profile"> "https://www.linkedin.com/in/azhar-saleem/" target="_blank"> https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin" alt="LinkedIn Profile">
    "https://www.youtube.com/@AzharSaleem19" target="_blank"> https://img.shields.io/badge/YouTube-Profile-red?style=for-the-badge&logo=youtube" alt="YouTube Profile"> "https://www.facebook.com/azhar.saleem1472/" target="_blank"> https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook" alt="Facebook Profile"> "https://www.tiktok.com/@azhar_saleem18" target="_blank"> https://img.shields.io/badge/TikTok-Profile-blue?style=for-the-badge&logo=tiktok" alt="TikTok Profile">
    "https://twitter.com/azhar_saleem18" target="_blank"> https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter" alt="Twitter Profile"> "https://www.instagram.com/azhar_saleem18/" target="_blank"> https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram" alt="Instagram Profile"> "mailto:azharsaleem6@gmail.com"> https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=gmail" alt="Email Contact">

    Dataset Description

    This comprehensive dataset provides an exhaustive snapshot of property listings for sale across the United Arab Emirates, including major cities like Dubai, Abu Dhabi, and Al Ain. Sourced from Bayut.com, this dataset serves as an invaluable resource for Data Scientists, Real Estate Analysts, Urban Planners, and Developers keen on exploring real estate market dynamics, price fluctuations, and development trends in the UAE.

    Dataset Overview

    The dataset contains over 41,000 entries, each representing a unique property for sale. It includes detailed information such as:

    • Price: Listing price of the property in AED.
    • Type: Specifies the property type, such as Apartment, Townhouse, etc.
    • Beds: Number of bedrooms, with '0' indicating a studio flat.
    • Baths: Number of bathrooms.
    • Address: Full address of the property, providing insights into its precise location.
    • Furnishing: Indicates whether the property is furnished or unfurnished.
    • Completion Status: Current status of the property (Ready, Off-Plan).
    • Building Name, Area Name, City: Provide contextual location details.
    • Year of Completion: Year when the property was completed or is expected to be completed.
    • Total Floors, Parking Spaces, Building Area: Key features of the property's building.
    • Latitude, Longitude: Geographic coordinates for more refined location analysis.
    • Purpose: Intended purpose of the listing, consistently noted as 'For Sale'.

    Usage

    This dataset is ideal for a variety of applications, including:

    • Market Analysis: Analyze trends in property prices and types across different regions.
    • Predictive Modeling: Develop machine learning models to predict property prices or to classify types of properties based on their features.
    • Urban Development Studies: Examine property distribution and characteristics to inform urban planning and development strategies.
    • Comparative Analysis: Compare properties across different cities and districts to identify investment opportunities or to study market behavior.

    Data Accessibility

    This dataset is publicly available and well-suited for anyone interested in conducting detailed analyses of the UAE real estate market, from academic researchers to industry professionals.

    Feel free to dive into this dataset to unlock comprehensive insights into the vibrant and diverse property market of the UAE, supporting a wide range of real estate, economic, and geographic studies.

  3. HWID12 (Highway Incidents Detection Dataset)

    • kaggle.com
    Updated May 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Landry KEZEBOU (2022). HWID12 (Highway Incidents Detection Dataset) [Dataset]. https://www.kaggle.com/datasets/landrykezebou/hwid12-highway-incidents-detection-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Landry KEZEBOU
    Description

    Context

    Action Recognition in video is known to be more challenging than image recognition problems. Unlike image recognition models which use 2D convolutional neural blocks, action classification models require additional dimensionality to capture the spatio-temporal information in video sequences. This intrinsically makes video action recognition models computationally intensive and significantly more data-hungry than image recognition counterparts. Unequivocally, existing video datasets such as Kinetics, AVA, Charades, Something-Something, HMDB51, and UFC101 have had tremendous impact on the recently evolving video recognition technologies. Artificial Intelligence models trained on these datasets have largely benefited applications such as behavior monitoring in elderly people, video summarization, and content-based retrieval. However, this growing concept of action recognition has yet to be explored in Intelligent Transportation System (ITS), particularly in vital applications such as incidents detection. This is partly due to the lack of availability of annotated dataset adequate for training models suitable for such direct ITS use cases. In this paper, the concept of video action recognition is explored to tackle the problem of highway incident detection and classification from live surveillance footage. First, a novel dataset - HWID12 (Highway Incidents Detection) dataset is introduced. The HWAD12 consists of 11 distinct highway incidents categories, and one additional category for negative samples representing normal traffic. The proposed dataset also includes 2780+ video segments of 3 to 8 seconds on average each, and 500k+ temporal frames. Next, the baseline for highway accident detection and classification is established with a state-of-the-art action recognition model trained on the proposed HWID12 dataset. Performance benchmarking for 12-class (normal traffic vs 11 accident categories), and 2-class (incident vs normal traffic) settings is performed. This benchmarking reveals a recognition accuracy of up to 88% and 98% for 12-class and 2-class recognition setting, respectively.

    Data Acquisition

    The Proposed Highway Incidents Detection Dataset (HWID12) is the first of its kind dataset aimed at fostering experimentation of video action recognition technologies to solve the practical problem of real-time highway incident detections which currently challenges intelligent transportation systems. The lack of such dataset has limited the expansion of the recent breakthroughs in video action classification for practical uses cases in intelligent transportation systems.. The proposed dataset contains more than 2780 video clips of length varying between 3 to 8 seconds. These video clips capture moments leading to, up until right after an incident occurred. The clips were manually segmented from accident compilations videos sourced from YouTube and other videos data platforms.

    Content

    There is one main zip file available for download. The zip file contains 2780+ video clips. 1) 12 folders
    2) each folder represents an incident category. One of the classes represent the negative sample class which simulates normal traffic.

    Terms and Conditions

    • Videos provided in this dataset are freely available for research and education purposes only. Please be sure to properly credit the authors by citing the article below.
    • Be sure to upvote this dataset if you find it useful by scrolling up and clicking the up-Arrow ^ sign at the top banner of the page, next to "New Notebook" button.
    • Be sure to blur out all plate numbers before publishing any of the contents available in this dataset.

    Acknowledgements

    Any publication using this database must reference to the following journal manuscript:

    • Landry Kezebou, Victor Oludare, Karen Panetta, James Intriligator, and Sos Agaian "Highway accident detection and classification from live traffic surveillance cameras: a comprehensive dataset and video action recognition benchmarking", Proc. SPIE 12100, Multimodal Image Exploitation and Learning 2022, 121000M (27 May 2022); https://doi.org/10.1117/12.2618943

    Note: if the link is broken, please use http instead of https.

    In Chrome, use the steps recommended in the following website to view the webpage if it appears to be broken https://www.technipages.com/chrome-enabledisable-not-secure-warning

    Other relevant datasets VCoR dataset: https://www.kaggle.com/landrykezebou/vcor-vehicle-color-recognition-dataset VRiV dataset: https://www.kaggle.com/landrykezebou/vriv-vehicle-recognition-in-videos-dataset

    For any enquires regarding the HWID12 dataset, contact: landrykezebou@gmail.com

  4. O

    BUTTER - Empirical Deep Learning Dataset

    • data.openei.org
    • datasets.ai
    • +2more
    code, data, website
    Updated May 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek (2022). BUTTER - Empirical Deep Learning Dataset [Dataset]. http://doi.org/10.25984/1872441
    Explore at:
    code, website, dataAvailable download formats
    Dataset updated
    May 20, 2022
    Dataset provided by
    Open Energy Data Initiative (OEDI)
    National Renewable Energy Laboratory
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
    Authors
    Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

  5. Z

    DORIS-MAE-v1

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, Jianyou (2023). DORIS-MAE-v1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8035109
    Explore at:
    Dataset updated
    Oct 17, 2023
    Dataset provided by
    Wang, Kaicheng
    Wang, Jianyou
    Naidu, Prudhviraj
    Wang, Xiaoyue
    Paturi, Ramamohan
    Bergen, Leon
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.

    Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.

    The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.

    The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.

    The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.

    Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).

    The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:

    ├── ada_embedding_for_DORIS-MAE_v1.pickle ├── "Query" │ ├── query_id_1 (Embedding of query_1) │ ├── query_id_2 (Embedding of query_2) │ └── query_id_3 (Embedding of query_3) │ . │ . │ . └── "Corpus" ├── corpus_id_1 (Embedding of abstract_1) ├── corpus_id_2 (Embedding of abstract_2) └── corpus_id_3 (Embedding of abstract_3) . . .

  6. h

    AI-Generated-vs-Real-Images-Datasets

    • huggingface.co
    Updated Aug 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hem Bahadur Gurung (2025). AI-Generated-vs-Real-Images-Datasets [Dataset]. https://huggingface.co/datasets/Hemg/AI-Generated-vs-Real-Images-Datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2025
    Authors
    Hem Bahadur Gurung
    Description

    Dataset Card for "AI-Generated-vs-Real-Images-Datasets"

    More Information needed

  7. S

    A dataset of one high-angel well drilled and logged at Bohai Sea during...

    • scidb.cn
    Updated Jan 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiong Zhang; Lin Lvlin (2024). A dataset of one high-angel well drilled and logged at Bohai Sea during FY2021 [Dataset]. http://doi.org/10.57760/sciencedb.j00186.00375
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 5, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Qiong Zhang; Lin Lvlin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bohai Sea
    Description

    The dataset is acquired from an actual well is located in the southern part of Liaodong Bay in the Bohai oil field. The well is drilled with a 241.3 mm (9.5 inch) bit. During logging, the tool is slide against the borehole wall from the bottom to the top and sampled at 76 mm intervals. The sieve residue log shows that the logging section is dominated by sandy mudstone formations, and borehole fluid sampling shows that borehole fluid has a density of 1.18 g/cm3 with a salinity of 94 kppm. Neutron and density data are obtained according to the log depth interval. The dataset includes 4 files: XX19-1N.xtf is the original data from the well log that could be opened using a xtf viewer. XX19-1N-Data.txt is the data read from the original xtf file. detection.py is the data processing code and XX19-1N. las is the neutron porosity data that are processed using detection.py.

  8. cleaned-dataset

    • kaggle.com
    Updated Mar 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wchohaw (2023). cleaned-dataset [Dataset]. https://www.kaggle.com/datasets/wchohaw/cleaned-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    wchohaw
    Description

    Dataset

    This dataset was created by wchohaw

    Contents

  9. R

    Dataset Ow Dataset

    • universe.roboflow.com
    zip
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Overwatch (2024). Dataset Ow Dataset [Dataset]. https://universe.roboflow.com/overwatch-4wpfl/dataset-ow
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 8, 2024
    Dataset authored and provided by
    Overwatch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Player Bounding Boxes
    Description

    Dataset Ow

    ## Overview
    
    Dataset Ow is a dataset for object detection tasks - it contains Player annotations for 10,000 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  10. I

    Cline Center Coup d’État Project Dataset

    • databank.illinois.edu
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto (2025). Cline Center Coup d’État Project Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9651987_V7
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
    Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
    Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7

  11. R

    Grocery Items Dataset

    • universe.roboflow.com
    zip
    Updated Mar 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grocery Items (2023). Grocery Items Dataset [Dataset]. https://universe.roboflow.com/grocery-items/grocery-items-bxx8e/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 25, 2023
    Dataset authored and provided by
    Grocery Items
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Items Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Smart Inventory Management: Develop an automated inventory system for grocery stores, where the "Grocery Items" computer vision model identifies and tracks product quantities on shelves, helping store managers optimize restocking and reduce product waste.

    2. Assisted Shopping Experience: Implement a user-friendly app for visually impaired users, where the computer vision model recognizes specific grocery items, making it easier for these individuals to identify and locate the products they need while shopping.

    3. Autonomous Grocery Robots: Develop a shopping assistant robot that uses the computer vision model to identify and collect specific items from a shopping list for customers, improving shopping efficiency and convenience.

    4. Data-driven Marketing Analysis: Leverage the computer vision technology to gather in-store data on product placement and store layout, enabling retailers to make better informed decisions about promotions, discounts, and product placement to maximize sales.

    5. Checkout-less Stores: Create a fully automated grocery store where the "Grocery Items" computer vision model tracks picked up and returned items, allowing customers to simply walk out of the store with their selected items while automatically generating their bills, increasing checkout efficiency and reducing wait times.

  12. t

    highD dataset - Dataset - LDM

    • service.tib.eu
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). highD dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/highd-dataset
    Explore at:
    Dataset updated
    Jan 2, 2025
    Description

    The highD dataset is a real-world high-definition video dataset of naturalistic vehicle trajectories on German highways.

  13. f

    IDCIA Dataset

    • figshare.com
    zip
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdurahman Ali Mohammed; Catherine Fonder; Donald S. Sakaguchi; Wallapak Tavanapong; Surya K. Mallapragada; Azeez Idris (2023). IDCIA Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.21970604.v7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    figshare
    Authors
    Abdurahman Ali Mohammed; Catherine Fonder; Donald S. Sakaguchi; Wallapak Tavanapong; Surya K. Mallapragada; Azeez Idris
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The provided dataset has two directories: 𝐼𝑚𝑎𝑔𝑒𝑠 and 𝑔𝑟𝑜𝑢𝑛𝑑_𝑡𝑟𝑢𝑡ℎ, and the readme.md file for the description of the dataset folder. The 𝐼𝑚𝑎𝑔𝑒𝑠 directory contains seven sub-directories, one for each of the primary antibody used in the experiments. Inside each directory are grayscale images from fluorescence microscopy. The images were resized to 800𝑥600 pixels.

    The Source codes used for baseline experiments can be found at https://github.com/abdumhmd/IDCIA

  14. u

    Amazon review data 2018

    • cseweb.ucsd.edu
    • nijianmo.github.io
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
    Explore at:
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    Context

    This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

    • More reviews:

      • The total number of reviews is 233.1 million (142.8 million in 2014).
    • New reviews:

      • Current data includes reviews in the range May 1996 - Oct 2018.
    • Metadata: - We have added transaction metadata for each review shown on the review page.

      • Added more detailed metadata of the product landing page.

    Acknowledgements

    If you publish articles based on this dataset, please cite the following paper:

    • Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
  15. R

    Underwater_project Dataset

    • universe.roboflow.com
    zip
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Real Images (2023). Underwater_project Dataset [Dataset]. https://universe.roboflow.com/real-images/underwater_project/dataset/4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 18, 2023
    Dataset authored and provided by
    Real Images
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Bla Bla Bounding Boxes
    Description

    Underwater_Project

    ## Overview
    
    Underwater_Project is a dataset for object detection tasks - it contains Bla Bla annotations for 798 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  16. b

    SMILE trial public dataset - Datasets - data.bris

    • data.bris.ac.uk
    Updated Apr 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). SMILE trial public dataset - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/2c1pfur00h0p52c7s8cnpg31hb
    Explore at:
    Dataset updated
    Apr 18, 2019
    Description

    This data set contains a number of variables from collected on children and their parents who took part in the SMILE trial at assessment and follow up. It does not include data on age and gender as we want to be certain that no child or parent can be identified through the data. Researchers can apply to access a fuller data set (https://data.bris.ac.uk/data/dataset/1myzti8qnv48g2sxtx6h5nice7) containing age and gender through application to the University of Bristol's Data Access Committee, please refer to the data access request form (http://bit.ly/data-bris-request) for details on how to apply for access. Complete download (zip, 1.5 MiB)

  17. t

    City of Philadelphia payments - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). City of Philadelphia payments - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/city-of-philadelphia-payments
    Explore at:
    Dataset updated
    Dec 16, 2024
    Area covered
    Philadelphia
    Description

    Two publicly available datasets of real-world financial payment data.

  18. N

    Ventura, IA Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Ventura, IA Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b2599312-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Iowa, Ventura
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Ventura by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Ventura across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of male population, with 50.28% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Ventura is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Ventura total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Ventura Population by Race & Ethnicity. You can refer the same here

  19. N

    Merced, CA Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Merced, CA Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b243e3c4-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Merced, California
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Merced by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Merced across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of female population, with 50.64% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Merced is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Merced total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Merced Population by Race & Ethnicity. You can refer the same here

  20. C

    Synthetic Integrated Services Data

    • data.wprdc.org
    csv, html, pdf, zip
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allegheny County (2024). Synthetic Integrated Services Data [Dataset]. https://data.wprdc.org/dataset/synthetic-integrated-services-data
    Explore at:
    html, zip(39231637), csv(1375554033), pdfAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset authored and provided by
    Allegheny County
    Description

    Motivation

    This dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.

    This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.

    Collection

    The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.

    Preprocessing

    Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.

    For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.

    Recommended Uses

    This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.

    Known Limitations/Biases

    Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.

    Feedback

    Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).

    Further Documentation and Resources

    1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
    2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
    3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
    4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
GenON (2023). Fake_or_Real_Competition_Dataset [Dataset]. https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset

Fake_or_Real_Competition_Dataset

aiconnect_fake_or_real

mncai/Fake_or_Real_Competition_Dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2023
Dataset authored and provided by
GenON
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

2023 Fake or Real: AI-generated Image Discrimination Competition dataset is now available on Hugging Face!

Hello🖐️ We are excited to announce the release of the dataset for the 2023 Fake or Real: AI-generated Image Discrimination Competition. The competition was held on AI CONNECT(https://aiconnect.kr/) from June 26th to July 6th, 2023, with 768 participants.If you're interested in evaluating the performance of your model on the test dataset, we encourage you to visit the… See the full description on the dataset page: https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset.

Search
Clear search
Close search
Google apps
Main menu