100+ datasets found

h
Fake_or_Real_Competition_Dataset
huggingface.co
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GenON (2023). Fake_or_Real_Competition_Dataset [Dataset]. https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2023
Dataset authored and provided by
GenON
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
2023 Fake or Real: AI-generated Image Discrimination Competition dataset is now available on Hugging Face!

Hello🖐️ We are excited to announce the release of the dataset for the 2023 Fake or Real: AI-generated Image Discrimination Competition. The competition was held on AI CONNECT(https://aiconnect.kr/) from June 26th to July 6th, 2023, with 768 participants.If you're interested in evaluating the performance of your model on the test dataset, we encourage you to visit the… See the full description on the dataset page: https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset.
🏙 Dubai Real Estate Sales Insights | UAE 🇦🇪 🏠
kaggle.com
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azhar Saleem (2024). 🏙 Dubai Real Estate Sales Insights | UAE 🇦🇪 🏠 [Dataset]. https://www.kaggle.com/datasets/azharsaleem/dubai-real-estate-sales-insights
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2024
Dataset provided by
Kaggle
Authors
Azhar Saleem
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Dubai, United Arab Emirates
Description
Dubai, UAE Real Estate Market Dataset

👨‍💻 Author: Azhar Saleem

"https://github.com/azharsaleem18" target="_blank"> https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github" alt="GitHub Profile"> "https://www.kaggle.com/azharsaleem" target="_blank"> https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle" alt="Kaggle Profile"> "https://www.linkedin.com/in/azhar-saleem/" target="_blank"> https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin" alt="LinkedIn Profile">
"https://www.youtube.com/@AzharSaleem19" target="_blank"> https://img.shields.io/badge/YouTube-Profile-red?style=for-the-badge&logo=youtube" alt="YouTube Profile"> "https://www.facebook.com/azhar.saleem1472/" target="_blank"> https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook" alt="Facebook Profile"> "https://www.tiktok.com/@azhar_saleem18" target="_blank"> https://img.shields.io/badge/TikTok-Profile-blue?style=for-the-badge&logo=tiktok" alt="TikTok Profile">
"https://twitter.com/azhar_saleem18" target="_blank"> https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter" alt="Twitter Profile"> "https://www.instagram.com/azhar_saleem18/" target="_blank"> https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram" alt="Instagram Profile"> "mailto:azharsaleem6@gmail.com"> https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=gmail" alt="Email Contact">

Dataset Description

This comprehensive dataset provides an exhaustive snapshot of property listings for sale across the United Arab Emirates, including major cities like Dubai, Abu Dhabi, and Al Ain. Sourced from Bayut.com, this dataset serves as an invaluable resource for Data Scientists, Real Estate Analysts, Urban Planners, and Developers keen on exploring real estate market dynamics, price fluctuations, and development trends in the UAE.

Dataset Overview

The dataset contains over 41,000 entries, each representing a unique property for sale. It includes detailed information such as:

Price: Listing price of the property in AED.

Type: Specifies the property type, such as Apartment, Townhouse, etc.

Beds: Number of bedrooms, with '0' indicating a studio flat.

Baths: Number of bathrooms.

Address: Full address of the property, providing insights into its precise location.

Furnishing: Indicates whether the property is furnished or unfurnished.

Completion Status: Current status of the property (Ready, Off-Plan).

Building Name, Area Name, City: Provide contextual location details.

Year of Completion: Year when the property was completed or is expected to be completed.

Total Floors, Parking Spaces, Building Area: Key features of the property's building.

Latitude, Longitude: Geographic coordinates for more refined location analysis.

Purpose: Intended purpose of the listing, consistently noted as 'For Sale'.

Usage

This dataset is ideal for a variety of applications, including:

Market Analysis: Analyze trends in property prices and types across different regions.

Predictive Modeling: Develop machine learning models to predict property prices or to classify types of properties based on their features.

Urban Development Studies: Examine property distribution and characteristics to inform urban planning and development strategies.

Comparative Analysis: Compare properties across different cities and districts to identify investment opportunities or to study market behavior.

Data Accessibility

This dataset is publicly available and well-suited for anyone interested in conducting detailed analyses of the UAE real estate market, from academic researchers to industry professionals.

Feel free to dive into this dataset to unlock comprehensive insights into the vibrant and diverse property market of the UAE, supporting a wide range of real estate, economic, and geographic studies.
HWID12 (Highway Incidents Detection Dataset)
kaggle.com
Updated May 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Landry KEZEBOU (2022). HWID12 (Highway Incidents Detection Dataset) [Dataset]. https://www.kaggle.com/datasets/landrykezebou/hwid12-highway-incidents-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Landry KEZEBOU
Description
Context

Action Recognition in video is known to be more challenging than image recognition problems. Unlike image recognition models which use 2D convolutional neural blocks, action classification models require additional dimensionality to capture the spatio-temporal information in video sequences. This intrinsically makes video action recognition models computationally intensive and significantly more data-hungry than image recognition counterparts. Unequivocally, existing video datasets such as Kinetics, AVA, Charades, Something-Something, HMDB51, and UFC101 have had tremendous impact on the recently evolving video recognition technologies. Artificial Intelligence models trained on these datasets have largely benefited applications such as behavior monitoring in elderly people, video summarization, and content-based retrieval. However, this growing concept of action recognition has yet to be explored in Intelligent Transportation System (ITS), particularly in vital applications such as incidents detection. This is partly due to the lack of availability of annotated dataset adequate for training models suitable for such direct ITS use cases. In this paper, the concept of video action recognition is explored to tackle the problem of highway incident detection and classification from live surveillance footage. First, a novel dataset - HWID12 (Highway Incidents Detection) dataset is introduced. The HWAD12 consists of 11 distinct highway incidents categories, and one additional category for negative samples representing normal traffic. The proposed dataset also includes 2780+ video segments of 3 to 8 seconds on average each, and 500k+ temporal frames. Next, the baseline for highway accident detection and classification is established with a state-of-the-art action recognition model trained on the proposed HWID12 dataset. Performance benchmarking for 12-class (normal traffic vs 11 accident categories), and 2-class (incident vs normal traffic) settings is performed. This benchmarking reveals a recognition accuracy of up to 88% and 98% for 12-class and 2-class recognition setting, respectively.

Data Acquisition

The Proposed Highway Incidents Detection Dataset (HWID12) is the first of its kind dataset aimed at fostering experimentation of video action recognition technologies to solve the practical problem of real-time highway incident detections which currently challenges intelligent transportation systems. The lack of such dataset has limited the expansion of the recent breakthroughs in video action classification for practical uses cases in intelligent transportation systems.. The proposed dataset contains more than 2780 video clips of length varying between 3 to 8 seconds. These video clips capture moments leading to, up until right after an incident occurred. The clips were manually segmented from accident compilations videos sourced from YouTube and other videos data platforms.

Content

There is one main zip file available for download. The zip file contains 2780+ video clips. 1) 12 folders
2) each folder represents an incident category. One of the classes represent the negative sample class which simulates normal traffic.

Terms and Conditions

Videos provided in this dataset are freely available for research and education purposes only. Please be sure to properly credit the authors by citing the article below.

Be sure to upvote this dataset if you find it useful by scrolling up and clicking the up-Arrow ^ sign at the top banner of the page, next to "New Notebook" button.

Be sure to blur out all plate numbers before publishing any of the contents available in this dataset.

Acknowledgements

Any publication using this database must reference to the following journal manuscript:

Landry Kezebou, Victor Oludare, Karen Panetta, James Intriligator, and Sos Agaian "Highway accident detection and classification from live traffic surveillance cameras: a comprehensive dataset and video action recognition benchmarking", Proc. SPIE 12100, Multimodal Image Exploitation and Learning 2022, 121000M (27 May 2022); https://doi.org/10.1117/12.2618943

Note: if the link is broken, please use http instead of https.

In Chrome, use the steps recommended in the following website to view the webpage if it appears to be broken https://www.technipages.com/chrome-enabledisable-not-secure-warning

Other relevant datasets VCoR dataset: https://www.kaggle.com/landrykezebou/vcor-vehicle-color-recognition-dataset VRiV dataset: https://www.kaggle.com/landrykezebou/vriv-vehicle-recognition-in-videos-dataset

For any enquires regarding the HWID12 dataset, contact: landrykezebou@gmail.com
O
BUTTER - Empirical Deep Learning Dataset
data.openei.org
datasets.ai
+2more
code, data, website
Updated May 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek (2022). BUTTER - Empirical Deep Learning Dataset [Dataset]. http://doi.org/10.25984/1872441
Explore at:
code, website, dataAvailable download formats
Unique identifier
https://doi.org/10.25984/1872441
Dataset updated
May 20, 2022
Dataset provided by
Open Energy Data Initiative (OEDI)
National Renewable Energy Laboratory
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
Authors
Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.
Z
DORIS-MAE-v1
data.niaid.nih.gov
zenodo.org
Updated Oct 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang, Jianyou (2023). DORIS-MAE-v1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8035109
Explore at:
Dataset updated
Oct 17, 2023
Dataset provided by
Wang, Kaicheng
Wang, Jianyou
Naidu, Prudhviraj
Wang, Xiaoyue
Paturi, Ramamohan
Bergen, Leon
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.

Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.

The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.

The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.

The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.

Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).

The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:

├── ada_embedding_for_DORIS-MAE_v1.pickle ├── "Query" │ ├── query_id_1 (Embedding of query_1) │ ├── query_id_2 (Embedding of query_2) │ └── query_id_3 (Embedding of query_3) │ . │ . │ . └── "Corpus" ├── corpus_id_1 (Embedding of abstract_1) ├── corpus_id_2 (Embedding of abstract_2) └── corpus_id_3 (Embedding of abstract_3) . . .
h
AI-Generated-vs-Real-Images-Datasets
huggingface.co
Updated Aug 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hem Bahadur Gurung (2025). AI-Generated-vs-Real-Images-Datasets [Dataset]. https://huggingface.co/datasets/Hemg/AI-Generated-vs-Real-Images-Datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2025
Authors
Hem Bahadur Gurung
Description
Dataset Card for "AI-Generated-vs-Real-Images-Datasets"

More Information needed
S
A dataset of one high-angel well drilled and logged at Bohai Sea during...
scidb.cn
Updated Jan 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiong Zhang; Lin Lvlin (2024). A dataset of one high-angel well drilled and logged at Bohai Sea during FY2021 [Dataset]. http://doi.org/10.57760/sciencedb.j00186.00375
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00186.00375
Dataset updated
Jan 5, 2024
Dataset provided by
Science Data Bank
Authors
Qiong Zhang; Lin Lvlin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bohai Sea
Description
The dataset is acquired from an actual well is located in the southern part of Liaodong Bay in the Bohai oil field. The well is drilled with a 241.3 mm (9.5 inch) bit. During logging, the tool is slide against the borehole wall from the bottom to the top and sampled at 76 mm intervals. The sieve residue log shows that the logging section is dominated by sandy mudstone formations, and borehole fluid sampling shows that borehole fluid has a density of 1.18 g/cm3 with a salinity of 94 kppm. Neutron and density data are obtained according to the log depth interval. The dataset includes 4 files: XX19-1N.xtf is the original data from the well log that could be opened using a xtf viewer. XX19-1N-Data.txt is the data read from the original xtf file. detection.py is the data processing code and XX19-1N. las is the neutron porosity data that are processed using detection.py.
cleaned-dataset
kaggle.com
Updated Mar 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wchohaw (2023). cleaned-dataset [Dataset]. https://www.kaggle.com/datasets/wchohaw/cleaned-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
wchohaw
Description
Dataset

This dataset was created by wchohaw

Contents
R
Dataset Ow Dataset
universe.roboflow.com
zip
Updated Jan 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Overwatch (2024). Dataset Ow Dataset [Dataset]. https://universe.roboflow.com/overwatch-4wpfl/dataset-ow
Explore at:
zipAvailable download formats
Dataset updated
Jan 8, 2024
Dataset authored and provided by
Overwatch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Player Bounding Boxes
Description
Dataset Ow

## Overview Dataset Ow is a dataset for object detection tasks - it contains Player annotations for 10,000 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
I
Cline Center Coup d’État Project Dataset
databank.illinois.edu
Updated May 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto (2025). Cline Center Coup d’État Project Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9651987_V7
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-9651987_V7
Dataset updated
May 11, 2025
Authors
Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7
R
Grocery Items Dataset
universe.roboflow.com
zip
Updated Mar 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grocery Items (2023). Grocery Items Dataset [Dataset]. https://universe.roboflow.com/grocery-items/grocery-items-bxx8e/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Mar 25, 2023
Dataset authored and provided by
Grocery Items
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Items Bounding Boxes
Description
Here are a few use cases for this project:

Smart Inventory Management: Develop an automated inventory system for grocery stores, where the "Grocery Items" computer vision model identifies and tracks product quantities on shelves, helping store managers optimize restocking and reduce product waste.

Assisted Shopping Experience: Implement a user-friendly app for visually impaired users, where the computer vision model recognizes specific grocery items, making it easier for these individuals to identify and locate the products they need while shopping.

Autonomous Grocery Robots: Develop a shopping assistant robot that uses the computer vision model to identify and collect specific items from a shopping list for customers, improving shopping efficiency and convenience.

Data-driven Marketing Analysis: Leverage the computer vision technology to gather in-store data on product placement and store layout, enabling retailers to make better informed decisions about promotions, discounts, and product placement to maximize sales.

Checkout-less Stores: Create a fully automated grocery store where the "Grocery Items" computer vision model tracks picked up and returned items, allowing customers to simply walk out of the store with their selected items while automatically generating their bills, increasing checkout efficiency and reducing wait times.
t
highD dataset - Dataset - LDM
service.tib.eu
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). highD dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/highd-dataset
Explore at:
Dataset updated
Jan 2, 2025
Description
The highD dataset is a real-world high-definition video dataset of naturalistic vehicle trajectories on German highways.
f
IDCIA Dataset
figshare.com
zip
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdurahman Ali Mohammed; Catherine Fonder; Donald S. Sakaguchi; Wallapak Tavanapong; Surya K. Mallapragada; Azeez Idris (2023). IDCIA Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.21970604.v7
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21970604.v7
Dataset updated
Sep 26, 2023
Dataset provided by
figshare
Authors
Abdurahman Ali Mohammed; Catherine Fonder; Donald S. Sakaguchi; Wallapak Tavanapong; Surya K. Mallapragada; Azeez Idris
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The provided dataset has two directories: 𝐼𝑚𝑎𝑔𝑒𝑠 and 𝑔𝑟𝑜𝑢𝑛𝑑_𝑡𝑟𝑢𝑡ℎ, and the readme.md file for the description of the dataset folder. The 𝐼𝑚𝑎𝑔𝑒𝑠 directory contains seven sub-directories, one for each of the primary antibody used in the experiments. Inside each directory are grayscale images from fluorescence microscopy. The images were resized to 800𝑥600 pixels.

The Source codes used for baseline experiments can be found at https://github.com/abdumhmd/IDCIA
u
Amazon review data 2018
cseweb.ucsd.edu
nijianmo.github.io
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
Explore at:
Dataset authored and provided by
UCSD CSE Research Project
Description
Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

More reviews:

The total number of reviews is 233.1 million (142.8 million in 2014).

New reviews:

Current data includes reviews in the range May 1996 - Oct 2018.

Metadata: - We have added transaction metadata for each review shown on the review page.

Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
R
Underwater_project Dataset
universe.roboflow.com
zip
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Real Images (2023). Underwater_project Dataset [Dataset]. https://universe.roboflow.com/real-images/underwater_project/dataset/4
Explore at:
zipAvailable download formats
Dataset updated
Dec 18, 2023
Dataset authored and provided by
Real Images
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Bla Bla Bounding Boxes
Description
Underwater_Project

## Overview Underwater_Project is a dataset for object detection tasks - it contains Bla Bla annotations for 798 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
b
SMILE trial public dataset - Datasets - data.bris
data.bris.ac.uk
Updated Apr 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). SMILE trial public dataset - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/2c1pfur00h0p52c7s8cnpg31hb
Explore at:
Dataset updated
Apr 18, 2019
Description
This data set contains a number of variables from collected on children and their parents who took part in the SMILE trial at assessment and follow up. It does not include data on age and gender as we want to be certain that no child or parent can be identified through the data. Researchers can apply to access a fuller data set (https://data.bris.ac.uk/data/dataset/1myzti8qnv48g2sxtx6h5nice7) containing age and gender through application to the University of Bristol's Data Access Committee, please refer to the data access request form (http://bit.ly/data-bris-request) for details on how to apply for access. Complete download (zip, 1.5 MiB)
t
City of Philadelphia payments - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). City of Philadelphia payments - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/city-of-philadelphia-payments
Explore at:
Dataset updated
Dec 16, 2024
Area covered
Philadelphia
Description
Two publicly available datasets of real-world financial payment data.
N
Ventura, IA Population Breakdown by Gender Dataset: Male and Female...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Ventura, IA Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b2599312-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Iowa, Ventura
Variables measured
Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Ventura by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Ventura across both sexes and to determine which sex constitutes the majority.

Key observations

There is a slight majority of male population, with 50.28% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

Variables / Data Columns

Gender: This column displays the Gender (Male / Female)

Population: The population of the gender in the Ventura is shown in this column.

% of Total Population: This column displays the percentage distribution of each gender as a proportion of Ventura total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Ventura Population by Race & Ethnicity. You can refer the same here
N
Merced, CA Population Breakdown by Gender Dataset: Male and Female...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Merced, CA Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b243e3c4-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Merced, California
Variables measured
Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Merced by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Merced across both sexes and to determine which sex constitutes the majority.

Key observations

There is a slight majority of female population, with 50.64% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

Variables / Data Columns

Gender: This column displays the Gender (Male / Female)

Population: The population of the gender in the Merced is shown in this column.

% of Total Population: This column displays the percentage distribution of each gender as a proportion of Merced total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Merced Population by Race & Ethnicity. You can refer the same here
C
Synthetic Integrated Services Data
data.wprdc.org
csv, html, pdf, zip
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allegheny County (2024). Synthetic Integrated Services Data [Dataset]. https://data.wprdc.org/dataset/synthetic-integrated-services-data
Explore at:
html, zip(39231637), csv(1375554033), pdfAvailable download formats
Dataset updated
Jun 25, 2024
Dataset authored and provided by
Allegheny County
Description
Motivation

This dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.

This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.

Collection

The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.

Preprocessing

Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.

For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.

Recommended Uses

This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.

Known Limitations/Biases

Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.

Feedback

Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).

Further Documentation and Resources

1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.

Facebook

Twitter

Click to copy link

Link copied

Cite

GenON (2023). Fake_or_Real_Competition_Dataset [Dataset]. https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset

Fake_or_Real_Competition_Dataset

aiconnect_fake_or_real

mncai/Fake_or_Real_Competition_Dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 28, 2023

Dataset authored and provided by

GenON

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

2023 Fake or Real: AI-generated Image Discrimination Competition dataset is now available on Hugging Face!

Hello🖐️ We are excited to announce the release of the dataset for the 2023 Fake or Real: AI-generated Image Discrimination Competition. The competition was held on AI CONNECT(https://aiconnect.kr/) from June 26th to July 6th, 2023, with 768 participants.If you're interested in evaluating the performance of your model on the test dataset, we encourage you to visit the… See the full description on the dataset page: https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset.

Clear search

Close search

Google apps

Main menu

Fake_or_Real_Competition_Dataset

🏙 Dubai Real Estate Sales Insights | UAE 🇦🇪 🏠

Dubai, UAE Real Estate Market Dataset

👨‍💻 Author: Azhar Saleem

Dataset Description

Dataset Overview

Usage

Data Accessibility

HWID12 (Highway Incidents Detection Dataset)

Context

Data Acquisition

Content

Terms and Conditions

Acknowledgements

BUTTER - Empirical Deep Learning Dataset

DORIS-MAE-v1

AI-Generated-vs-Real-Images-Datasets

A dataset of one high-angel well drilled and logged at Bohai Sea during...

cleaned-dataset

Dataset

Contents

Dataset Ow Dataset

Dataset Ow

Cline Center Coup d’État Project Dataset

Grocery Items Dataset

highD dataset - Dataset - LDM

IDCIA Dataset

Amazon review data 2018

Context

Acknowledgements

Underwater_project Dataset

Underwater_Project

SMILE trial public dataset - Datasets - data.bris

City of Philadelphia payments - Dataset - LDM

Ventura, IA Population Breakdown by Gender Dataset: Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Merced, CA Population Breakdown by Gender Dataset: Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Synthetic Integrated Services Data

Motivation

Collection

Preprocessing

Recommended Uses

Known Limitations/Biases

Feedback

Further Documentation and Resources

Fake_or_Real_Competition_Dataset

aiconnect_fake_or_real

mncai/Fake_or_Real_Competition_Dataset