15 datasets found

d
Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...
datarade.ai
.csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori, Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-web-data-machine-learning-d-factori
Explore at:
.csvAvailable download formats
Dataset authored and provided by
Factori
Area covered
United Kingdom
Description
Factori's AI & ML training data is thoroughly tested and reviewed to ensure that what you receive on your end is of the best quality.

Integrate the comprehensive AI & ML training data provided by Grepsr and develop a superior AI & ML model.

Whether you're training algorithms for natural language processing, sentiment analysis, or any other AI application, we can deliver comprehensive datasets tailored to fuel your machine learning initiatives.

Enhanced Data Quality: We have rigorous data validation processes and also conduct quality assurance checks to guarantee the integrity and reliability of the training data for you to develop the AI & ML models.

Gain a competitive edge, drive innovation, and unlock new opportunities by leveraging the power of tailored Artificial Intelligence and Machine Learning training data with Factori.

We offer web activity data of users that are browsing popular websites around the world. This data can be used to analyze web behavior across the web and build highly accurate audience segments based on web activity for targeting ads based on interest categories and search/browsing intent.

Web Data Reach: Our reach data represents the total number of data counts available within various categories and comprises attributes such as Country, Anonymous ID, IP addresses, Search Query, and so on.

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).

Data Attributes: Anonymous_id IDType Timestamp Estid Ip userAgent browserFamily deviceType Os Url_metadata_canonical_url Url_metadata_raw_query_params refDomain mappedEvent Channel searchQuery Ttd_id Adnxs_id Keywords Categories Entities Concepts
R
Car Highway Dataset
universe.roboflow.com
zip
Updated Sep 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sallar (2023). Car Highway Dataset [Dataset]. https://universe.roboflow.com/sallar/car-highway/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 13, 2023
Dataset authored and provided by
Sallar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Vehicles Bounding Boxes
Description
Car-Highway Data Annotation Project

Introduction

In this project, we aim to annotate car images captured on highways. The annotated data will be used to train machine learning models for various computer vision tasks, such as object detection and classification.

Project Goals

Collect a diverse dataset of car images from highway scenes.

Annotate the dataset to identify and label cars within each image.

Organize and format the annotated data for machine learning model training.

Tools and Technologies

For this project, we will be using Roboflow, a powerful platform for data annotation and preprocessing. Roboflow simplifies the annotation process and provides tools for data augmentation and transformation.

Annotation Process

Upload the raw car images to the Roboflow platform.

Use the annotation tools in Roboflow to draw bounding boxes around each car in the images.

Label each bounding box with the corresponding class (e.g., car).

Review and validate the annotations for accuracy.

Data Augmentation

Roboflow offers data augmentation capabilities, such as rotation, flipping, and resizing. These augmentations can help improve the model's robustness.

Data Export

Once the data is annotated and augmented, Roboflow allows us to export the dataset in various formats suitable for training machine learning models, such as YOLO, COCO, or TensorFlow Record.

Milestones

Data Collection and Preprocessing

Annotation of Car Images

Data Augmentation

Data Export

Model Training

Conclusion

By completing this project, we will have a well-annotated dataset ready for training machine learning models. This dataset can be used for a wide range of applications in computer vision, including car detection and tracking on highways.
d
Factori AI & ML Training Data | Consumer Data | USA | Machine Learning Data
datarade.ai
.json, .csv
Updated Jul 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori (2022). Factori AI & ML Training Data | Consumer Data | USA | Machine Learning Data [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-consumer-data-usa-machine-factori
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jul 23, 2022
Dataset authored and provided by
Factori
Area covered
United States
Description
Our consumer data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences.

Geography - City, State, ZIP, County, CBSA, Census Tract, etc.

Demographics - Gender, Age Group, Marital Status, Language etc.

Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc

Persona - Consumer type, Communication preferences, Family type, etc

Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc.

Household - Number of Children, Number of Adults, IP Address, etc.

Behaviours - Brand Affinity, App Usage, Web Browsing etc.

Firmographics - Industry, Company, Occupation, Revenue, etc

Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc.

Auto - Car Make, Model, Type, Year, etc.

Housing - Home type, Home value, Renter/Owner, Year Built etc.

Consumer Graph Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

Consumer Graph Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

Here's the schema of Consumer Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
mortgage_loan2_type mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender mortgage_loan2_lender
mortgage_loan2_ratetype mortgage_rate
mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
office_census_block_group
office_census_tract office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl company_linkedinurl company_twitterurl
company_website company_fortune_rank
company_government_type company_headquarters_branch company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual company_msa company_msa_name
company_naics_code
company_naics_description
company_naics_code2 company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4 company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description company_parent_company
company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
company_revenue company_sales_volume
company_small_business company_stock_ticker company_year_founded company_minorityowned
company_female_owned_or_operated company_franchise_code company_dma company_dma_name
company_hq_address
company_hq_city company_hq_duns company_hq_state
company_hq_zip5 company_hq_zip4 c...
Zenodo Open Metadata snapshot - Training dataset for records and communities...
zenodo.org
data.niaid.nih.gov
application/gzip, bin
Updated Dec 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo team; Zenodo team (2022). Zenodo Open Metadata snapshot - Training dataset for records and communities classifier building [Dataset]. http://doi.org/10.5281/zenodo.7438358
Explore at:
bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7438358
Dataset updated
Dec 15, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zenodo team; Zenodo team
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains Zenodo's published open access records and communities metadata, including entries marked by the Zenodo staff as spam and deleted.

The datasets are gzipped compressed JSON-lines files, where each line is a JSON object representation of a Zenodo record or community.

Records dataset

Filename: zenodo_open_metadata_{ date of export }.jsonl.gz

Each object contains the terms: part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date

which correspond to the fields with the same name available in Zenodo's record JSON Schema at https://zenodo.org/schemas/records/record-v1.0.0.json.

In addition, some terms have been altered:

The term files contains a list of dictionaries containing filetype, size, and filename only.

The term license contains a short Zenodo ID of the license (e.g. "cc-by").

Communities dataset

Filename: zenodo_community_metadata_{ date of export }.jsonl.gz

Each object contains the terms: id, title, description, curation_policy, page

which correspond to the fields with the same name available in Zenodo's community creation form.

Notes for all datasets

For each object the term spam contains a boolean value, determining whether a given record/community was marked as spam content by Zenodo staff.

Some values for the top-level terms, which were missing in the metadata may contain a null value.

A smaller uncompressed random sample of 200 JSON lines is also included for each dataset to test and get familiar with the format without having to download the entire dataset.
replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabian Plum; Fabian Plum; René Bulla; Hendrik Beck; Hendrik Beck; Natalie Imirzian; Natalie Imirzian; David Labonte; David Labonte; René Bulla (2023). replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks [Dataset]. http://doi.org/10.5281/zenodo.7849417
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7849417
Dataset updated
Apr 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Fabian Plum; Fabian Plum; René Bulla; Hendrik Beck; Hendrik Beck; Natalie Imirzian; Natalie Imirzian; David Labonte; David Labonte; René Bulla
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains all recorded and hand-annotated as well as all synthetically generated data as well as representative trained networks used for detection and tracking experiments in the replicAnt - generating annotated images of animals in complex environments using Unreal Engine manuscript. Unless stated otherwise, all 3D animal models used in the synthetically generated data have been generated with the open-source photgrammetry platform scAnt peerj.com/articles/11155/. All synthetic data has been generated with the associated replicAnt project available from https://github.com/evo-biomech/replicAnt.

Abstract:

Deep learning-based computer vision methods are transforming animal behavioural research. Transfer learning has enabled work in non-model species, but still requires hand-annotation of example footage, and is only performant in well-defined conditions. To overcome these limitations, we created replicAnt, a configurable pipeline implemented in Unreal Engine 5 and Python, designed to generate large and variable training datasets on consumer-grade hardware instead. replicAnt places 3D animal models into complex, procedurally generated environments, from which automatically annotated images can be exported. We demonstrate that synthetic data generated with replicAnt can significantly reduce the hand-annotation required to achieve benchmark performance in common applications such as animal detection, tracking, pose-estimation, and semantic segmentation; and that it increases the subject-specificity and domain-invariance of the trained networks, so conferring robustness. In some applications, replicAnt may even remove the need for hand-annotation altogether. It thus represents a significant step towards porting deep learning-based computer vision tools to the field.

Benchmark data

Two video datasets were curated to quantify detection performance; one in laboratory and one in field conditions. The laboratory dataset consists of top-down recordings of foraging trails of Atta vollenweideri (Forel 1893) leaf-cutter ants. The colony was collected in Uruguay in 2014, and housed in a climate chamber at 25°C and 60% humidity. A recording box was built from clear acrylic, and placed between the colony nest and a box external to the climate chamber, which functioned as feeding site. Bramble leaves were placed in the feeding area prior to each recording session, and ants had access to the recording area at will. The recorded area was 104 mm wide and 200 mm long. An OAK-D camera (OpenCV AI Kit: OAK-D, Luxonis Holding Corporation) was positioned centrally 195 mm above the ground. While keeping the camera position constant, lighting, exposure, and background conditions were varied to create recordings with variable appearance: The “base” case is an evenly lit and well exposed scene with scattered leaf fragments on an otherwise plain white backdrop. A “bright” and “dark” case are characterised by systematic over- or underexposure, respectively, which introduces motion blur, colour-clipped appendages, and extensive flickering and compression artefacts. In a separate well exposed recording, the clear acrylic backdrop was substituted with a printout of a highly textured forest ground to create a “noisy” case. Last, we decreased the camera distance to 100 mm at constant focal distance, effectively doubling the magnification, and yielding a “close” case, distinguished by out-of-focus workers. All recordings were captured at 25 frames per second (fps).

The field datasets consists of video recordings of Gnathamitermes sp. desert termites, filmed close to the nest entrance in the desert of Maricopa County, Arizona, using a Nikon D850 and a Nikkor 18-105 mm lens on a tripod at camera distances between 20 cm to 40 cm. All video recordings were well exposed, and captured at 23.976 fps.

Each video was trimmed to the first 1000 frames, and contains between 36 and 103 individuals. In total, 5000 and 1000 frames were hand-annotated for the laboratory- and field-dataset, respectively: each visible individual was assigned a constant size bounding box, with a centre coinciding approximately with the geometric centre of the thorax in top-down view. The size of the bounding boxes was chosen such that they were large enough to completely enclose the largest individuals, and was automatically adjusted near the image borders. A custom-written Blender Add-on aided hand-annotation: the Add-on is a semi-automated multi animal tracker, which leverages blender’s internal contrast-based motion tracker, but also include track refinement options, and CSV export functionality. Comprehensive documentation of this tool and Jupyter notebooks for track visualisation and benchmarking is provided on the replicAnt and BlenderMotionExport GitHub repositories.

Synthetic data generation

Two synthetic datasets, each with a population size of 100, were generated from 3D models of \textit{Atta vollenweideri} leaf-cutter ants. All 3D models were created with the scAnt photogrammetry workflow. A “group” population was based on three distinct 3D models of an ant minor (1.1 mg), a media (9.8 mg), and a major (50.1 mg) (see 10.5281/zenodo.7849059)). To approximately simulate the size distribution of A. vollenweideri colonies, these models make up 20%, 60%, and 20% of the simulated population, respectively. A 33% within-class scale variation, with default hue, contrast, and brightness subject material variation, was used. A “single” population was generated using the major model only, with 90% scale variation, but equal material variation settings.

A Gnathamitermes sp. synthetic dataset was generated from two hand-sculpted models; a worker and a soldier made up 80% and 20% of the simulated population of 100 individuals, respectively with default hue, contrast, and brightness subject material variation. Both 3D models were created in Blender v3.1, using reference photographs.

Each of the three synthetic datasets contains 10,000 images, rendered at a resolution of 1024 by 1024 px, using the default generator settings as documented in the Generator_example level file (see documentation on GitHub). To assess how the training dataset size affects performance, we trained networks on 100 (“small”), 1,000 (“medium”), and 10,000 (“large”) subsets of the “group” dataset. Generating 10,000 samples at the specified resolution took approximately 10 hours per dataset on a consumer-grade laptop (6 Core 4 GHz CPU, 16 GB RAM, RTX 2070 Super).

Additionally, five datasets which contain both real and synthetic images were curated. These “mixed” datasets combine image samples from the synthetic “group” dataset with image samples from the real “base” case. The ratio between real and synthetic images across the five datasets varied between 10/1 to 1/100.

Funding

This study received funding from Imperial College’s President’s PhD Scholarship (to Fabian Plum), and is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement No. 851705, to David Labonte). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
d
Factori AI & ML Training Data | Mobility Data | Global | Machine Learning...
datarade.ai
.csv
Updated Dec 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori (2019). Factori AI & ML Training Data | Mobility Data | Global | Machine Learning Data | Carrier, IP address, Hex8, Hex9 | Historical Location data [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-mobility-data-global-mac-factori
Explore at:
.csvAvailable download formats
Dataset updated
Dec 12, 2019
Dataset authored and provided by
Factori
Area covered
Italy, Greenland, Fiji, Burundi, Costa Rica, Grenada, Togo, Virgin Islands (U.S.), United States Minor Outlying Islands, Tokelau
Description
Mobility/Location data is gathered from location-aware mobile apps using an SDK-based implementation. All users explicitly consent to allow location data sharing using a clear opt-in process for our use cases and are given clear opt-out options. Factori ingests, cleans, validates, and exports all location data signals to ensure only the highest quality of data is made available for analysis.

Record Count:90 Billion+ Capturing Frequency: Once per Event Delivering Frequency: Once per Day Updated: Daily

Mobility Data Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings.

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited interval (daily/weekly/monthly/quarterly).

Use Cases: Consumer Insight: Gain a comprehensive 360-degree perspective of the customer to spot behavioral changes, analyze trends and predict business outcomes. Market Intelligence: Study various market areas, the proximity of points or interests, and the competitive landscape. Advertising: Create campaigns and customize your messaging depending on your target audience's online and offline activity. Retail Analytics Analyze footfall trends in various locations and gain understanding of customer personas.

Here's the data attributes: maid latitude longtitude horizontal_accuracy timestamp id_type ipv4 ipv6 user_agent country state_hasc city_hasc postcode geohash hex8 hex9 carrier
Mammary epithelial intravital imaging data and MaSCOT-AI Cellpose model for...
zenodo.org
explore.openaire.eu
+1more
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caleb A Dawson; Caleb A Dawson (2025). Mammary epithelial intravital imaging data and MaSCOT-AI Cellpose model for analysis of in vivo cell shape dynamics [Dataset]. http://doi.org/10.5281/zenodo.14503476
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14503476
Dataset updated
Jan 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Caleb A Dawson; Caleb A Dawson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data and deep learning segmentation model deposited here are derived from 3D multicoloured intravital microscopy of mammary epithelial cells during development. We aimed to study in vivo cell shape dynamics in real-time in an unbiased way. This robust and deep analysis revealed that hormone-responsive breast cells are unexpectedly elongated and motile at a high frequency during duct growth. The data is associated with our publication Dawson, Milevskiy et al, Cell Reports 2024, Hormone-responsive progenitors have a unique identity and exhibit high motility during mammary morphogenesis. https://doi.org/10.1016/j.celrep.2024.115073

Deposited data
- Single channel intravital movie maximum projections (File:MaSCOT-AI Max projections). These are up to 5 hours long, with timepoints every 10 minutes.
- Extracted 5th time points from each movie that we used for model training (File:MaSCOT-AI t5 training)
- Segmentation files generated by Cellpose 2.2.2 (File: MaSCOT-AI t5 segmentation files)

Analysis scripts:
The Trackmate-Cellpose python script, R data processing scripts and example excel data sheet are on github at https://github.com/cadaws/MaSCOT-AI

Example analysis and data export:
A small set of example data and resulting trackmate-Cellpose output will be uploaded at a later date.

Methods
27 4D movies were acquired every 10 minutes by multiphoton microscopy of anaesthetised cell-type-specific confetti mice at different stages of development. 350 single channel, single-cell thick layers (10-30 µm sections) were isolated by 3D cropping, then flattened by max projection. The 5th time point from all movies was taken for model training in Cellpose 2.2.2, which was achieved after manual correction of segmentation for 150 images (MaSCOT-AI model).

The MaSCOT-AI model was used in a high throughput Trackmate-Cellpose script in ImageJ to track mammary cell shape over time.

Software versions:
Cellpose 2.2.2 GUI with GPU was installed according to https://pypi.org/project/cellpose/ (March 2024).
Trackmate v7.11.1

File name structure
Date_mouse-model_developmental-stage_fluorescent-protein_z-span

Mouse models:
K5: K5-rtTA/tetoCre/Confetti
Elf5: Elf5-rtTA/tetoCre/Confetti
Pr: PR-Cre/Confetti

Developmental stage:
no label = Terminal end bud at 5 weeks
duct/notpreg = duct at 6 or 9 weeks
6dPreg/6dplug = 6 days pregnancy
6d MPA = 6 days MPA treatment
MPAveh = 6 days MPA vehicle treatment
f
Popnet: computer vision based bespoken deep learning model for forecasting...
figshare.com
zip
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Byeonghwa Jeong; Bokyeong Lee (2025). Popnet: computer vision based bespoken deep learning model for forecasting gridded population [Dataset]. http://doi.org/10.6084/m9.figshare.25959652.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25959652.v1
Dataset updated
May 27, 2025
Dataset provided by
figshare
Authors
Byeonghwa Jeong; Bokyeong Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study introduces Popnet, a deep learning model for forecasting 1km-gridded populations, integrating U-Net, ConvLSTM, a Spatial Autocorrelation module and deep ensemble methods. Using spatial variables and population data from 2000 to 2020, Popnet predicts South Korea's population trends by age groups (under 14, 15-64, over 65) up to 2040. In validation, it outperforms traditional machine learning and state-of-the-art computer vision models. The output of this model discovered significant polarization: population growth in urban areas, especially the capital region, and severe depopulation in rural areas. Popnet is a robust tool for offering significant insights to policymakers and related stakeholders about the detailed future population, which allows them to establish detailed, localised planning and resource allocations.*Due to the export restrictions on grid data imposed by the National Geographic Information Institute of Korea, the training data has been replaced with data from Tennessee. However, the Korean version of the future prediction data remains unchanged. Please take this into consideration.
Demo datasets for PhenoLearn
zenodo.org
zip
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yichen He; Yichen He (2025). Demo datasets for PhenoLearn [Dataset]. http://doi.org/10.5281/zenodo.8152784
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8152784
Dataset updated
May 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yichen He; Yichen He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This Zenodo record contains two test datasets (Birds and Littorina) used in the paper:

PhenoLearn: A user-friendly Toolkit for Image Annotation and Deep Learning-Based Phenotyping for Biological Datasets

Authors: Yichen He, Christopher R. Cooney, Steve Maddock, Gavin H. Thomas

PhenoLearn is a graphical and script-based toolkit designed to help biologists annotate and analyse biological images using deep learning. This dataset includes two test cases: one of bird specimen images for semantic segmentation, and another of marine snail (Littorina) images for landmark detection. These datasets are used to demonstrate the PhenoLearn workflow in the accompanying paper.

Dataset Structure

Bird Dataset

train/ — 120 bird specimen images for annotation and model training.

pred/ — 100 images for prediction and testing.

seg_train.csv — Pixel-wise segmentations (CSV format with RLE or polygon masks).

name_file_pred — Filenames corresponding to prediction images.

Littorina Dataset

train/ — 120 snail images for training landmark prediction models.

pred/ — 100 snail images for model testing.

pts_train.csv — Ground-truth landmark coordinates for training images.

name_file_pred — Prediction image filenames for evaluation.

How to Use These Datasets

Workflow Instructions (via PhenoLearn)

Download the dataset folders.

Use PhenoLearn to load seg_train.csv (segmentation) or pts_train.csv (landmark) to view and edit annotations.

Train segmentation or landmark prediction models directly via PhenoLearn's training module, or export data for external tools.

Use name_file_pred to match predictions with ground-truth for evaluation.

See the full tutorial and usage guide in the https://github.com/EchanHe/PhenoLearn.
R
Urcdd Dataset
universe.roboflow.com
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xeon (2023). Urcdd Dataset [Dataset]. https://universe.roboflow.com/xeon/urcdd/dataset/4
Explore at:
zipAvailable download formats
Dataset updated
Dec 11, 2023
Dataset authored and provided by
xeon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Roblox Characters Bounding Boxes
Description
URCDD (Unversal Roblox Character Detection Dataset)

Welcome to the Universal Roblox Character Detection Dataset (URCDD). This dataset is a comprehensive collection of images extracted from various games on the Roblox platform. Our primary objective is to offer a diverse and extensive dataset that encompasses the wide array of characters found in Roblox games.

Key Features

Diverse Game Genres: URCDD includes character images from a myriad of game genres on Roblox, ranging from first-person shooters to roleplaying games.

Comprehensive Coverage: Our dataset aims to provide a complete representation of the different character types encountered across the Roblox gaming ecosystem.

Facilitating Model Training: URCDD is designed to simplify the training process for machine learning models focused on character detection within the Roblox environment.

Dataset Details

Size: The dataset comprises a substantial number of images collected from hundreds of Roblox games. The version specific image count can be viewed in the Versions tab.

Format: Images are provided in a standardized format for ease of use in machine learning applications. You can export this dataset in any format you desire, thanks to Roboflow.

Annotations: This dataset currently only has one class that is deticated to identifying a player's character. We are looking to add more classes in the future.

Tags

We have created a unique tag for each game that we have collected data from. Refer to the list below: 1. baseplate - https://www.roblox.com/games/4483381587 2. da-hood - https://www.roblox.com/games/2788229376 3. arsenal - https://www.roblox.com/games/286090429 4. aimblox - https://www.roblox.com/games/6808416928 5. hood-customs - https://www.roblox.com/games/9825515356 6. counter-blox - https://www.roblox.com/games/301549746/ 7. hood-testing - https://www.roblox.com/games/12673840215 8. phantom-forces - https://www.roblox.com/games/292439477 9. entrenched - https://www.roblox.com/games/3678761576
d
Crypto Market Data CSV Export: Trades, Quotes & Order Book Access via S3
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CoinAPI, Crypto Market Data CSV Export: Trades, Quotes & Order Book Access via S3 [Dataset]. https://datarade.ai/data-products/coinapi-comprehensive-crypto-market-data-in-flat-files-tra-coinapi
Explore at:
.json, .csvAvailable download formats
Dataset provided by
Coinapi Ltd
Authors
CoinAPI
Area covered
Solomon Islands, Kyrgyzstan, Norfolk Island, Montserrat, Iraq, Tanzania, Liechtenstein, Qatar, Northern Mariana Islands, Latvia
Description
When you need to analyze crypto market history, batch processing often beats streaming APIs. That's why we built the Flat Files S3 API - giving analysts and researchers direct access to structured historical cryptocurrency data without the integration complexity of traditional APIs.

Pull comprehensive historical data across 800+ cryptocurrencies and their trading pairs, delivered in clean, ready-to-use CSV formats that drop straight into your analysis tools. Whether you're building backtest environments, training machine learning models, or running complex market studies, our flat file approach gives you the flexibility to work with massive datasets efficiently.

Why work with us?

Market Coverage & Data Types: - Comprehensive historical data since 2010 (for chosen assets) - Comprehensive order book snapshots and updates - Trade-by-trade data

Technical Excellence: - 99,9% uptime guarantee - Standardized data format across exchanges - Flexible Integration - Detailed documentation - Scalable Architecture

CoinAPI serves hundreds of institutions worldwide, from trading firms and hedge funds to research organizations and technology providers. Our S3 delivery method easily integrates with your existing workflows, offering familiar access patterns, reliable downloads, and straightforward automation for your data team. Our commitment to data quality and technical excellence, combined with accessible delivery options, makes us the trusted choice for institutions that demand both comprehensive historical data and real-time market intelligence
Thermal Cheetah Object Detection Dataset
public.roboflow.com
zip
Updated Feb 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow (2024). Thermal Cheetah Object Detection Dataset [Dataset]. https://public.roboflow.com/object-detection/thermal-cheetah
Explore at:
zipAvailable download formats
Dataset updated
Feb 28, 2024
Dataset provided by
Roboflow, Inc.
Authors
Roboflow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Bounding Boxes of cheetah
Description
About this Dataset

This is a collection of images and video frames of cheetahs at the Omaha Henry Doorly Zoo taken in October, 2020. The capture device was a SEEK Thermal Compact XR connected to an iPhone 11 Pro. Video frames were sampled and labeled by hand with bounding boxes for object detection using Robofow.

Using this Dataset

We have provided the dataset for download under a creative commons by-attribution license. You may use this dataset in any project (including for commercial use) but must cite Roboflow as the source.

Example Use Cases

This dataset could be used for conservation of endangered species, cataloging animals with a trail camera, gathering statistics on wildlife behavior, or experimenting with other thermal and infrared imagery.

About Roboflow

Roboflow creates tools that make computer vision easy to use for any developer, even if you're not a machine learning expert. You can use it to organize, label, inspect, convert, and export your image datasets. And even to train and deploy computer vision models with no code required.
f
Annotated dataset to assess the accuracy of the textual description of...
figshare.com
txt
Updated Dec 19, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matteo Lorenzini; Marco Rospocher; Sara Tonelli (2020). Annotated dataset to assess the accuracy of the textual description of cultural heritage records [Dataset]. http://doi.org/10.6084/m9.figshare.13359104.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13359104.v1
Dataset updated
Dec 19, 2020
Dataset provided by
figshare
Authors
Matteo Lorenzini; Marco Rospocher; Sara Tonelli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains more than 100K textual descriptions of cultural items from Cultura Italia (http://www.culturaitalia.it/opencms/index.jsp?language=en), the Italian National Cultural aggregator. Each of the description is labeled either HIGH or LOW quality, according its adherence to the standard cataloguing guidelines provided by Istituto Centrale per il Catalogo e la Documentazione (ICCD). More precisely, each description is labeled as HIGH quality if the object and subject of the item (for which the description is provided) are both described according to the ICCD guidelines, and as LOW quality in all other cases. Most of the dataset was manually annotated, with ~30K descriptions automatically labeled as LOW quality due to their length (less than 3 tokens) or their provenance from old (pre-2012), not curated, collections. The dataset was developed to support the training and testing of ML text classification approaches for automatically assessing the quality of textual descriptions in digital Cultural Heritage repositories.The dataset is provided as a CSV file, where each row corresponds to an item from Cultura Italia, and contains the textual description of the item, the domain of the item (OpereArteVisiva/RepertoArcheologico/Architettura) and the quality label (Low_Quality/High_Quality).The textual descriptions in the dataset are provided by Cultura Italia with a "Public Domain" license (c.f., http://www.culturaitalia.it/opencms/export/sites/culturaitalia/attachments/linked_open_data/Licenza_CulturaItalia_CC0.pdf). The whole dataset, including the annotation, is openly distributed according to the Creative Commons Attribution-ShareAlike 4.0 Generic (CC BY-SA 4.0) licence.
License Plates Object Detection Dataset - Original License Plates
public.roboflow.com
zip
Updated Oct 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow (2022). License Plates Object Detection Dataset - Original License Plates [Dataset]. https://public.roboflow.com/object-detection/license-plates-us-eu/3
Explore at:
zipAvailable download formats
Dataset updated
Oct 15, 2022
Dataset provided by
Roboflow, Inc.
Authors
Roboflow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Bounding Boxes of Plates
Description
Overview

The License Plates dataset is a object detection dataset of different vehicles (i.e. cars, vans, etc.) and their respective license plate. Annotations also include examples of "vehicle" and "license-plate". This dataset has a train/validation/test split of 245/70/35 respectively. https://i.imgur.com/JmRgjBq.png" alt="Dataset Example">

Use Cases

This dataset could be used to create a vehicle and license plate detection object detection model. Roboflow provides a great guide on creating a license plate and vehicle object detection model.

Using this Dataset

This dataset is a subset of the Open Images Dataset. The annotations are licensed by Google LLC under CC BY 4.0 license. Some annotations have been combined or removed using Roboflow's annotation management tools to better align the annotations with the purpose of the dataset. The images have a CC BY 2.0 license.

About Roboflow

Roboflow creates tools that make computer vision easy to use for any developer, even if you're not a machine learning expert. You can use it to organize, label, inspect, convert, and export your image datasets. And even to train and deploy computer vision models with no code required. https://i.imgur.com/WHFqYSJ.png" alt="https://roboflow.com">
Dark-field Microscopy Dataset and SVM Code for Agave-Based Mezcal...
zenodo.org
bin, txt, zip
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miguel G Ramirez-Elias; Miguel G Ramirez-Elias; Juan Carlos Torres-Galván; Juan Carlos Torres-Galván; Pedro E. Ramirez-Gonzalez; Pedro E. Ramirez-Gonzalez; Luis Adrián Langarica; Edgar Guevara; Edgar Guevara; Luis Adrián Langarica (2025). Dark-field Microscopy Dataset and SVM Code for Agave-Based Mezcal Classification [Dataset]. http://doi.org/10.5281/zenodo.15810264
Explore at:
txt, zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15810264
Dataset updated
Jul 5, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Miguel G Ramirez-Elias; Miguel G Ramirez-Elias; Juan Carlos Torres-Galván; Juan Carlos Torres-Galván; Pedro E. Ramirez-Gonzalez; Pedro E. Ramirez-Gonzalez; Luis Adrián Langarica; Edgar Guevara; Edgar Guevara; Luis Adrián Langarica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 4, 2025
Description
Dataset Title:
Dark-Field Microscopy Images of Evaporated Mezcal Droplets for Agave Species Classification

Description:
This dataset contains dark-field microscopy images of mezcal samples produced from four agave species: Agave salmiana (salmiana), Agave marmorata (tepeztate), Agave rhodacantha (cuishe), and Agave angustifolia (espadin), as well as an aged salmiana. Each 1 μL droplet of diluted mezcal (20% ABV) was deposited on a cleaned glass slide and allowed to evaporate under ambient conditions to form distinct microstructures. The resulting images were acquired at 4× magnification and used to train and validate a Support Vector Machine (SVM) classifier to distinguish between the first two varietals. The dataset supports research in agave-based spirit authentication, chemometric image analysis, and low-cost classification of artisanal products.

Contents:

JPEG or PNG image files organized by class (/salmiana/, /tepeztate/, /espadin/, /cuishe/,and /tepeztate/)

Python scripts and Jupyter Notebooks for training, evaluation, and model export

Pretrained SVM model and label encoder files

Format:
Images (224×224 pixels), Notebooks (.ipynb)

Intended Use:
Research in chemometrics, machine learning, and food authentication. May also serve as a benchmark dataset for image-based classification of fermented or distilled products.

License:
Creative Commons Attribution 4.0 International (CC BY 4.0)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Factori, Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-web-data-machine-learning-d-factori

Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day

Explore at:

.csvAvailable download formats

Dataset authored and provided by

Factori

Area covered

United Kingdom

Description

Factori's AI & ML training data is thoroughly tested and reviewed to ensure that what you receive on your end is of the best quality.

Integrate the comprehensive AI & ML training data provided by Grepsr and develop a superior AI & ML model.

Whether you're training algorithms for natural language processing, sentiment analysis, or any other AI application, we can deliver comprehensive datasets tailored to fuel your machine learning initiatives.

Enhanced Data Quality: We have rigorous data validation processes and also conduct quality assurance checks to guarantee the integrity and reliability of the training data for you to develop the AI & ML models.

Gain a competitive edge, drive innovation, and unlock new opportunities by leveraging the power of tailored Artificial Intelligence and Machine Learning training data with Factori.

We offer web activity data of users that are browsing popular websites around the world. This data can be used to analyze web behavior across the web and build highly accurate audience segments based on web activity for targeting ads based on interest categories and search/browsing intent.

Web Data Reach: Our reach data represents the total number of data counts available within various categories and comprises attributes such as Country, Anonymous ID, IP addresses, Search Query, and so on.

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).

Data Attributes: Anonymous_id IDType Timestamp Estid Ip userAgent browserFamily deviceType Os Url_metadata_canonical_url Url_metadata_raw_query_params refDomain mappedEvent Channel searchQuery Ttd_id Adnxs_id Keywords Categories Entities Concepts

Clear search

Close search

Google apps

Main menu

Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...

Car Highway Dataset

Car-Highway Data Annotation Project

Introduction

Project Goals

Tools and Technologies

Annotation Process

Data Augmentation

Data Export

Milestones

Conclusion

Factori AI & ML Training Data | Consumer Data | USA | Machine Learning Data

Zenodo Open Metadata snapshot - Training dataset for records and communities...

replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks

Factori AI & ML Training Data | Mobility Data | Global | Machine Learning...

Mammary epithelial intravital imaging data and MaSCOT-AI Cellpose model for...

Popnet: computer vision based bespoken deep learning model for forecasting...

Demo datasets for PhenoLearn

Dataset Structure

Bird Dataset

Littorina Dataset

How to Use These Datasets

Workflow Instructions (via PhenoLearn)

Urcdd Dataset

URCDD (Unversal Roblox Character Detection Dataset)

Key Features

Dataset Details

Tags

Crypto Market Data CSV Export: Trades, Quotes & Order Book Access via S3

Thermal Cheetah Object Detection Dataset

About this Dataset

Using this Dataset

Example Use Cases

About Roboflow

Annotated dataset to assess the accuracy of the textual description of...

License Plates Object Detection Dataset - Original License Plates

Overview

Use Cases

Using this Dataset

About Roboflow

Dark-field Microscopy Dataset and SVM Code for Agave-Based Mezcal...

Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day