15 datasets found
  1. d

    Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...

    • datarade.ai
    .csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Factori, Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-web-data-machine-learning-d-factori
    Explore at:
    .csvAvailable download formats
    Dataset authored and provided by
    Factori
    Area covered
    United Kingdom
    Description

    Factori's AI & ML training data is thoroughly tested and reviewed to ensure that what you receive on your end is of the best quality.

    Integrate the comprehensive AI & ML training data provided by Grepsr and develop a superior AI & ML model.

    Whether you're training algorithms for natural language processing, sentiment analysis, or any other AI application, we can deliver comprehensive datasets tailored to fuel your machine learning initiatives.

    Enhanced Data Quality: We have rigorous data validation processes and also conduct quality assurance checks to guarantee the integrity and reliability of the training data for you to develop the AI & ML models.

    Gain a competitive edge, drive innovation, and unlock new opportunities by leveraging the power of tailored Artificial Intelligence and Machine Learning training data with Factori.

    We offer web activity data of users that are browsing popular websites around the world. This data can be used to analyze web behavior across the web and build highly accurate audience segments based on web activity for targeting ads based on interest categories and search/browsing intent.

    Web Data Reach: Our reach data represents the total number of data counts available within various categories and comprises attributes such as Country, Anonymous ID, IP addresses, Search Query, and so on.

    Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).

    Data Attributes: Anonymous_id IDType Timestamp Estid Ip userAgent browserFamily deviceType Os Url_metadata_canonical_url Url_metadata_raw_query_params refDomain mappedEvent Channel searchQuery Ttd_id Adnxs_id Keywords Categories Entities Concepts

  2. R

    Car Highway Dataset

    • universe.roboflow.com
    zip
    Updated Sep 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sallar (2023). Car Highway Dataset [Dataset]. https://universe.roboflow.com/sallar/car-highway/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 13, 2023
    Dataset authored and provided by
    Sallar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Vehicles Bounding Boxes
    Description

    Car-Highway Data Annotation Project

    Introduction

    In this project, we aim to annotate car images captured on highways. The annotated data will be used to train machine learning models for various computer vision tasks, such as object detection and classification.

    Project Goals

    • Collect a diverse dataset of car images from highway scenes.
    • Annotate the dataset to identify and label cars within each image.
    • Organize and format the annotated data for machine learning model training.

    Tools and Technologies

    For this project, we will be using Roboflow, a powerful platform for data annotation and preprocessing. Roboflow simplifies the annotation process and provides tools for data augmentation and transformation.

    Annotation Process

    1. Upload the raw car images to the Roboflow platform.
    2. Use the annotation tools in Roboflow to draw bounding boxes around each car in the images.
    3. Label each bounding box with the corresponding class (e.g., car).
    4. Review and validate the annotations for accuracy.

    Data Augmentation

    Roboflow offers data augmentation capabilities, such as rotation, flipping, and resizing. These augmentations can help improve the model's robustness.

    Data Export

    Once the data is annotated and augmented, Roboflow allows us to export the dataset in various formats suitable for training machine learning models, such as YOLO, COCO, or TensorFlow Record.

    Milestones

    1. Data Collection and Preprocessing
    2. Annotation of Car Images
    3. Data Augmentation
    4. Data Export
    5. Model Training

    Conclusion

    By completing this project, we will have a well-annotated dataset ready for training machine learning models. This dataset can be used for a wide range of applications in computer vision, including car detection and tracking on highways.

  3. d

    Factori AI & ML Training Data | Consumer Data | USA | Machine Learning Data

    • datarade.ai
    .json, .csv
    Updated Jul 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Factori (2022). Factori AI & ML Training Data | Consumer Data | USA | Machine Learning Data [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-consumer-data-usa-machine-factori
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Jul 23, 2022
    Dataset authored and provided by
    Factori
    Area covered
    United States
    Description

    Our consumer data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

    Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences.

    1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc.
    2. Demographics - Gender, Age Group, Marital Status, Language etc.
    3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc
    4. Persona - Consumer type, Communication preferences, Family type, etc
    5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc.
    6. Household - Number of Children, Number of Adults, IP Address, etc.
    7. Behaviours - Brand Affinity, App Usage, Web Browsing etc.
    8. Firmographics - Industry, Company, Occupation, Revenue, etc
    9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc.
    10. Auto - Car Make, Model, Type, Year, etc.
    11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

    Consumer Graph Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

    Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

    Consumer Graph Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

    Here's the schema of Consumer Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
    credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
    mortgage_loan2_type mortgage_lender_code
    mortgage_loan2_render_code
    mortgage_lender mortgage_loan2_lender
    mortgage_loan2_ratetype mortgage_rate
    mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
    office_census_block_group
    office_census_tract office_county_code
    company_phone
    company_credit_score
    company_csa_code
    company_dpbc
    company_franchiseflag
    company_facebookurl company_linkedinurl company_twitterurl
    company_website company_fortune_rank
    company_government_type company_headquarters_branch company_home_business
    company_industry
    company_num_pcs_used
    company_num_employees
    company_firm_individual company_msa company_msa_name
    company_naics_code
    company_naics_description
    company_naics_code2 company_naics_description2
    company_sic_code2
    company_sic_code2_description
    company_sic_code4 company_sic_code4_description
    company_sic_code6
    company_sic_code6_description
    company_sic_code8
    company_sic_code8_description company_parent_company
    company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
    company_revenue company_sales_volume
    company_small_business company_stock_ticker company_year_founded company_minorityowned
    company_female_owned_or_operated company_franchise_code company_dma company_dma_name
    company_hq_address
    company_hq_city company_hq_duns company_hq_state
    company_hq_zip5 company_hq_zip4 c...

  4. Zenodo Open Metadata snapshot - Training dataset for records and communities...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, bin
    Updated Dec 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo team; Zenodo team (2022). Zenodo Open Metadata snapshot - Training dataset for records and communities classifier building [Dataset]. http://doi.org/10.5281/zenodo.7438358
    Explore at:
    bin, application/gzipAvailable download formats
    Dataset updated
    Dec 15, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zenodo team; Zenodo team
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains Zenodo's published open access records and communities metadata, including entries marked by the Zenodo staff as spam and deleted.

    The datasets are gzipped compressed JSON-lines files, where each line is a JSON object representation of a Zenodo record or community.

    Records dataset

    Filename: zenodo_open_metadata_{ date of export }.jsonl.gz

    Each object contains the terms: part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date

    which correspond to the fields with the same name available in Zenodo's record JSON Schema at https://zenodo.org/schemas/records/record-v1.0.0.json.

    In addition, some terms have been altered:

    • The term files contains a list of dictionaries containing filetype, size, and filename only.
    • The term license contains a short Zenodo ID of the license (e.g. "cc-by").

    Communities dataset

    Filename: zenodo_community_metadata_{ date of export }.jsonl.gz

    Each object contains the terms: id, title, description, curation_policy, page

    which correspond to the fields with the same name available in Zenodo's community creation form.

    Notes for all datasets

    For each object the term spam contains a boolean value, determining whether a given record/community was marked as spam content by Zenodo staff.

    Some values for the top-level terms, which were missing in the metadata may contain a null value.

    A smaller uncompressed random sample of 200 JSON lines is also included for each dataset to test and get familiar with the format without having to download the entire dataset.

  5. replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Plum; Fabian Plum; René Bulla; Hendrik Beck; Hendrik Beck; Natalie Imirzian; Natalie Imirzian; David Labonte; David Labonte; René Bulla (2023). replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks [Dataset]. http://doi.org/10.5281/zenodo.7849417
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fabian Plum; Fabian Plum; René Bulla; Hendrik Beck; Hendrik Beck; Natalie Imirzian; Natalie Imirzian; David Labonte; David Labonte; René Bulla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all recorded and hand-annotated as well as all synthetically generated data as well as representative trained networks used for detection and tracking experiments in the replicAnt - generating annotated images of animals in complex environments using Unreal Engine manuscript. Unless stated otherwise, all 3D animal models used in the synthetically generated data have been generated with the open-source photgrammetry platform scAnt peerj.com/articles/11155/. All synthetic data has been generated with the associated replicAnt project available from https://github.com/evo-biomech/replicAnt.

    Abstract:

    Deep learning-based computer vision methods are transforming animal behavioural research. Transfer learning has enabled work in non-model species, but still requires hand-annotation of example footage, and is only performant in well-defined conditions. To overcome these limitations, we created replicAnt, a configurable pipeline implemented in Unreal Engine 5 and Python, designed to generate large and variable training datasets on consumer-grade hardware instead. replicAnt places 3D animal models into complex, procedurally generated environments, from which automatically annotated images can be exported. We demonstrate that synthetic data generated with replicAnt can significantly reduce the hand-annotation required to achieve benchmark performance in common applications such as animal detection, tracking, pose-estimation, and semantic segmentation; and that it increases the subject-specificity and domain-invariance of the trained networks, so conferring robustness. In some applications, replicAnt may even remove the need for hand-annotation altogether. It thus represents a significant step towards porting deep learning-based computer vision tools to the field.

    Benchmark data

    Two video datasets were curated to quantify detection performance; one in laboratory and one in field conditions. The laboratory dataset consists of top-down recordings of foraging trails of Atta vollenweideri (Forel 1893) leaf-cutter ants. The colony was collected in Uruguay in 2014, and housed in a climate chamber at 25°C and 60% humidity. A recording box was built from clear acrylic, and placed between the colony nest and a box external to the climate chamber, which functioned as feeding site. Bramble leaves were placed in the feeding area prior to each recording session, and ants had access to the recording area at will. The recorded area was 104 mm wide and 200 mm long. An OAK-D camera (OpenCV AI Kit: OAK-D, Luxonis Holding Corporation) was positioned centrally 195 mm above the ground. While keeping the camera position constant, lighting, exposure, and background conditions were varied to create recordings with variable appearance: The “base” case is an evenly lit and well exposed scene with scattered leaf fragments on an otherwise plain white backdrop. A “bright” and “dark” case are characterised by systematic over- or underexposure, respectively, which introduces motion blur, colour-clipped appendages, and extensive flickering and compression artefacts. In a separate well exposed recording, the clear acrylic backdrop was substituted with a printout of a highly textured forest ground to create a “noisy” case. Last, we decreased the camera distance to 100 mm at constant focal distance, effectively doubling the magnification, and yielding a “close” case, distinguished by out-of-focus workers. All recordings were captured at 25 frames per second (fps).

    The field datasets consists of video recordings of Gnathamitermes sp. desert termites, filmed close to the nest entrance in the desert of Maricopa County, Arizona, using a Nikon D850 and a Nikkor 18-105 mm lens on a tripod at camera distances between 20 cm to 40 cm. All video recordings were well exposed, and captured at 23.976 fps.

    Each video was trimmed to the first 1000 frames, and contains between 36 and 103 individuals. In total, 5000 and 1000 frames were hand-annotated for the laboratory- and field-dataset, respectively: each visible individual was assigned a constant size bounding box, with a centre coinciding approximately with the geometric centre of the thorax in top-down view. The size of the bounding boxes was chosen such that they were large enough to completely enclose the largest individuals, and was automatically adjusted near the image borders. A custom-written Blender Add-on aided hand-annotation: the Add-on is a semi-automated multi animal tracker, which leverages blender’s internal contrast-based motion tracker, but also include track refinement options, and CSV export functionality. Comprehensive documentation of this tool and Jupyter notebooks for track visualisation and benchmarking is provided on the replicAnt and BlenderMotionExport GitHub repositories.

    Synthetic data generation

    Two synthetic datasets, each with a population size of 100, were generated from 3D models of \textit{Atta vollenweideri} leaf-cutter ants. All 3D models were created with the scAnt photogrammetry workflow. A “group” population was based on three distinct 3D models of an ant minor (1.1 mg), a media (9.8 mg), and a major (50.1 mg) (see 10.5281/zenodo.7849059)). To approximately simulate the size distribution of A. vollenweideri colonies, these models make up 20%, 60%, and 20% of the simulated population, respectively. A 33% within-class scale variation, with default hue, contrast, and brightness subject material variation, was used. A “single” population was generated using the major model only, with 90% scale variation, but equal material variation settings.

    A Gnathamitermes sp. synthetic dataset was generated from two hand-sculpted models; a worker and a soldier made up 80% and 20% of the simulated population of 100 individuals, respectively with default hue, contrast, and brightness subject material variation. Both 3D models were created in Blender v3.1, using reference photographs.

    Each of the three synthetic datasets contains 10,000 images, rendered at a resolution of 1024 by 1024 px, using the default generator settings as documented in the Generator_example level file (see documentation on GitHub). To assess how the training dataset size affects performance, we trained networks on 100 (“small”), 1,000 (“medium”), and 10,000 (“large”) subsets of the “group” dataset. Generating 10,000 samples at the specified resolution took approximately 10 hours per dataset on a consumer-grade laptop (6 Core 4 GHz CPU, 16 GB RAM, RTX 2070 Super).


    Additionally, five datasets which contain both real and synthetic images were curated. These “mixed” datasets combine image samples from the synthetic “group” dataset with image samples from the real “base” case. The ratio between real and synthetic images across the five datasets varied between 10/1 to 1/100.

    Funding

    This study received funding from Imperial College’s President’s PhD Scholarship (to Fabian Plum), and is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement No. 851705, to David Labonte). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  6. d

    Factori AI & ML Training Data | Mobility Data | Global | Machine Learning...

    • datarade.ai
    .csv
    Updated Dec 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Factori (2019). Factori AI & ML Training Data | Mobility Data | Global | Machine Learning Data | Carrier, IP address, Hex8, Hex9 | Historical Location data [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-mobility-data-global-mac-factori
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Dec 12, 2019
    Dataset authored and provided by
    Factori
    Area covered
    Italy, Greenland, Fiji, Burundi, Costa Rica, Grenada, Togo, Virgin Islands (U.S.), United States Minor Outlying Islands, Tokelau
    Description

    Mobility/Location data is gathered from location-aware mobile apps using an SDK-based implementation. All users explicitly consent to allow location data sharing using a clear opt-in process for our use cases and are given clear opt-out options. Factori ingests, cleans, validates, and exports all location data signals to ensure only the highest quality of data is made available for analysis.

    Record Count:90 Billion+ Capturing Frequency: Once per Event Delivering Frequency: Once per Day Updated: Daily

    Mobility Data Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings.

    Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited interval (daily/weekly/monthly/quarterly).

    Use Cases: Consumer Insight: Gain a comprehensive 360-degree perspective of the customer to spot behavioral changes, analyze trends and predict business outcomes. Market Intelligence: Study various market areas, the proximity of points or interests, and the competitive landscape. Advertising: Create campaigns and customize your messaging depending on your target audience's online and offline activity. Retail Analytics Analyze footfall trends in various locations and gain understanding of customer personas.

    Here's the data attributes: maid latitude longtitude horizontal_accuracy timestamp id_type ipv4 ipv6 user_agent country state_hasc city_hasc postcode geohash hex8 hex9 carrier

  7. Mammary epithelial intravital imaging data and MaSCOT-AI Cellpose model for...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caleb A Dawson; Caleb A Dawson (2025). Mammary epithelial intravital imaging data and MaSCOT-AI Cellpose model for analysis of in vivo cell shape dynamics [Dataset]. http://doi.org/10.5281/zenodo.14503476
    Explore at:
    Dataset updated
    Jan 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Caleb A Dawson; Caleb A Dawson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data and deep learning segmentation model deposited here are derived from 3D multicoloured intravital microscopy of mammary epithelial cells during development. We aimed to study in vivo cell shape dynamics in real-time in an unbiased way. This robust and deep analysis revealed that hormone-responsive breast cells are unexpectedly elongated and motile at a high frequency during duct growth. The data is associated with our publication Dawson, Milevskiy et al, Cell Reports 2024, Hormone-responsive progenitors have a unique identity and exhibit high motility during mammary morphogenesis. https://doi.org/10.1016/j.celrep.2024.115073

    Deposited data
    - Single channel intravital movie maximum projections (File:MaSCOT-AI Max projections). These are up to 5 hours long, with timepoints every 10 minutes.
    - Extracted 5th time points from each movie that we used for model training (File:MaSCOT-AI t5 training)
    - Segmentation files generated by Cellpose 2.2.2 (File: MaSCOT-AI t5 segmentation files)

    Analysis scripts:
    The Trackmate-Cellpose python script, R data processing scripts and example excel data sheet are on github at https://github.com/cadaws/MaSCOT-AI

    Example analysis and data export:
    A small set of example data and resulting trackmate-Cellpose output will be uploaded at a later date.

    Methods
    27 4D movies were acquired every 10 minutes by multiphoton microscopy of anaesthetised cell-type-specific confetti mice at different stages of development. 350 single channel, single-cell thick layers (10-30 µm sections) were isolated by 3D cropping, then flattened by max projection. The 5th time point from all movies was taken for model training in Cellpose 2.2.2, which was achieved after manual correction of segmentation for 150 images (MaSCOT-AI model).

    The MaSCOT-AI model was used in a high throughput Trackmate-Cellpose script in ImageJ to track mammary cell shape over time.

    Software versions:
    Cellpose 2.2.2 GUI with GPU was installed according to https://pypi.org/project/cellpose/ (March 2024).
    Trackmate v7.11.1

    File name structure
    Date_mouse-model_developmental-stage_fluorescent-protein_z-span

    Mouse models:
    K5: K5-rtTA/tetoCre/Confetti
    Elf5: Elf5-rtTA/tetoCre/Confetti
    Pr: PR-Cre/Confetti

    Developmental stage:
    no label = Terminal end bud at 5 weeks
    duct/notpreg = duct at 6 or 9 weeks
    6dPreg/6dplug = 6 days pregnancy
    6d MPA = 6 days MPA treatment
    MPAveh = 6 days MPA vehicle treatment

  8. f

    Popnet: computer vision based bespoken deep learning model for forecasting...

    • figshare.com
    zip
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Byeonghwa Jeong; Bokyeong Lee (2025). Popnet: computer vision based bespoken deep learning model for forecasting gridded population [Dataset]. http://doi.org/10.6084/m9.figshare.25959652.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 27, 2025
    Dataset provided by
    figshare
    Authors
    Byeonghwa Jeong; Bokyeong Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study introduces Popnet, a deep learning model for forecasting 1km-gridded populations, integrating U-Net, ConvLSTM, a Spatial Autocorrelation module and deep ensemble methods. Using spatial variables and population data from 2000 to 2020, Popnet predicts South Korea's population trends by age groups (under 14, 15-64, over 65) up to 2040. In validation, it outperforms traditional machine learning and state-of-the-art computer vision models. The output of this model discovered significant polarization: population growth in urban areas, especially the capital region, and severe depopulation in rural areas. Popnet is a robust tool for offering significant insights to policymakers and related stakeholders about the detailed future population, which allows them to establish detailed, localised planning and resource allocations.*Due to the export restrictions on grid data imposed by the National Geographic Information Institute of Korea, the training data has been replaced with data from Tennessee. However, the Korean version of the future prediction data remains unchanged. Please take this into consideration.

  9. Demo datasets for PhenoLearn

    • zenodo.org
    zip
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yichen He; Yichen He (2025). Demo datasets for PhenoLearn [Dataset]. http://doi.org/10.5281/zenodo.8152784
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yichen He; Yichen He
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Zenodo record contains two test datasets (Birds and Littorina) used in the paper:

    PhenoLearn: A user-friendly Toolkit for Image Annotation and Deep Learning-Based Phenotyping for Biological Datasets

    Authors: Yichen He, Christopher R. Cooney, Steve Maddock, Gavin H. Thomas

    PhenoLearn is a graphical and script-based toolkit designed to help biologists annotate and analyse biological images using deep learning. This dataset includes two test cases: one of bird specimen images for semantic segmentation, and another of marine snail (Littorina) images for landmark detection. These datasets are used to demonstrate the PhenoLearn workflow in the accompanying paper.

    Dataset Structure

    Bird Dataset

    • train/ — 120 bird specimen images for annotation and model training.
    • pred/ — 100 images for prediction and testing.
    • seg_train.csv — Pixel-wise segmentations (CSV format with RLE or polygon masks).
    • name_file_pred — Filenames corresponding to prediction images.

    Littorina Dataset

    • train/ — 120 snail images for training landmark prediction models.
    • pred/ — 100 snail images for model testing.
    • pts_train.csv — Ground-truth landmark coordinates for training images.
    • name_file_pred — Prediction image filenames for evaluation.

    How to Use These Datasets

    Workflow Instructions (via PhenoLearn)

    1. Download the dataset folders.

    2. Use PhenoLearn to load seg_train.csv (segmentation) or pts_train.csv (landmark) to view and edit annotations.

    3. Train segmentation or landmark prediction models directly via PhenoLearn's training module, or export data for external tools.

    4. Use name_file_pred to match predictions with ground-truth for evaluation.

    See the full tutorial and usage guide in the https://github.com/EchanHe/PhenoLearn.

  10. R

    Urcdd Dataset

    • universe.roboflow.com
    zip
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xeon (2023). Urcdd Dataset [Dataset]. https://universe.roboflow.com/xeon/urcdd/dataset/4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 11, 2023
    Dataset authored and provided by
    xeon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Roblox Characters Bounding Boxes
    Description

    URCDD (Unversal Roblox Character Detection Dataset)

    Welcome to the Universal Roblox Character Detection Dataset (URCDD). This dataset is a comprehensive collection of images extracted from various games on the Roblox platform. Our primary objective is to offer a diverse and extensive dataset that encompasses the wide array of characters found in Roblox games.

    Key Features

    1. Diverse Game Genres: URCDD includes character images from a myriad of game genres on Roblox, ranging from first-person shooters to roleplaying games.
    2. Comprehensive Coverage: Our dataset aims to provide a complete representation of the different character types encountered across the Roblox gaming ecosystem.
    3. Facilitating Model Training: URCDD is designed to simplify the training process for machine learning models focused on character detection within the Roblox environment.

    Dataset Details

    • Size: The dataset comprises a substantial number of images collected from hundreds of Roblox games. The version specific image count can be viewed in the Versions tab.
    • Format: Images are provided in a standardized format for ease of use in machine learning applications. You can export this dataset in any format you desire, thanks to Roboflow.
    • Annotations: This dataset currently only has one class that is deticated to identifying a player's character. We are looking to add more classes in the future.

    Tags

    We have created a unique tag for each game that we have collected data from. Refer to the list below: 1. baseplate - https://www.roblox.com/games/4483381587 2. da-hood - https://www.roblox.com/games/2788229376 3. arsenal - https://www.roblox.com/games/286090429 4. aimblox - https://www.roblox.com/games/6808416928 5. hood-customs - https://www.roblox.com/games/9825515356 6. counter-blox - https://www.roblox.com/games/301549746/ 7. hood-testing - https://www.roblox.com/games/12673840215 8. phantom-forces - https://www.roblox.com/games/292439477 9. entrenched - https://www.roblox.com/games/3678761576

  11. d

    Crypto Market Data CSV Export: Trades, Quotes & Order Book Access via S3

    • datarade.ai
    .json, .csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CoinAPI, Crypto Market Data CSV Export: Trades, Quotes & Order Book Access via S3 [Dataset]. https://datarade.ai/data-products/coinapi-comprehensive-crypto-market-data-in-flat-files-tra-coinapi
    Explore at:
    .json, .csvAvailable download formats
    Dataset provided by
    Coinapi Ltd
    Authors
    CoinAPI
    Area covered
    Solomon Islands, Kyrgyzstan, Norfolk Island, Montserrat, Iraq, Tanzania, Liechtenstein, Qatar, Northern Mariana Islands, Latvia
    Description

    When you need to analyze crypto market history, batch processing often beats streaming APIs. That's why we built the Flat Files S3 API - giving analysts and researchers direct access to structured historical cryptocurrency data without the integration complexity of traditional APIs.

    Pull comprehensive historical data across 800+ cryptocurrencies and their trading pairs, delivered in clean, ready-to-use CSV formats that drop straight into your analysis tools. Whether you're building backtest environments, training machine learning models, or running complex market studies, our flat file approach gives you the flexibility to work with massive datasets efficiently.

    Why work with us?

    Market Coverage & Data Types: - Comprehensive historical data since 2010 (for chosen assets) - Comprehensive order book snapshots and updates - Trade-by-trade data

    Technical Excellence: - 99,9% uptime guarantee - Standardized data format across exchanges - Flexible Integration - Detailed documentation - Scalable Architecture

    CoinAPI serves hundreds of institutions worldwide, from trading firms and hedge funds to research organizations and technology providers. Our S3 delivery method easily integrates with your existing workflows, offering familiar access patterns, reliable downloads, and straightforward automation for your data team. Our commitment to data quality and technical excellence, combined with accessible delivery options, makes us the trusted choice for institutions that demand both comprehensive historical data and real-time market intelligence

  12. Thermal Cheetah Object Detection Dataset

    • public.roboflow.com
    zip
    Updated Feb 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2024). Thermal Cheetah Object Detection Dataset [Dataset]. https://public.roboflow.com/object-detection/thermal-cheetah
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    Roboflow, Inc.
    Authors
    Roboflow
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of cheetah
    Description

    About this Dataset

    This is a collection of images and video frames of cheetahs at the Omaha Henry Doorly Zoo taken in October, 2020. The capture device was a SEEK Thermal Compact XR connected to an iPhone 11 Pro. Video frames were sampled and labeled by hand with bounding boxes for object detection using Robofow.

    Using this Dataset

    We have provided the dataset for download under a creative commons by-attribution license. You may use this dataset in any project (including for commercial use) but must cite Roboflow as the source.

    Example Use Cases

    This dataset could be used for conservation of endangered species, cataloging animals with a trail camera, gathering statistics on wildlife behavior, or experimenting with other thermal and infrared imagery.

    About Roboflow

    Roboflow creates tools that make computer vision easy to use for any developer, even if you're not a machine learning expert. You can use it to organize, label, inspect, convert, and export your image datasets. And even to train and deploy computer vision models with no code required.

  13. f

    Annotated dataset to assess the accuracy of the textual description of...

    • figshare.com
    txt
    Updated Dec 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matteo Lorenzini; Marco Rospocher; Sara Tonelli (2020). Annotated dataset to assess the accuracy of the textual description of cultural heritage records [Dataset]. http://doi.org/10.6084/m9.figshare.13359104.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 19, 2020
    Dataset provided by
    figshare
    Authors
    Matteo Lorenzini; Marco Rospocher; Sara Tonelli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains more than 100K textual descriptions of cultural items from Cultura Italia (http://www.culturaitalia.it/opencms/index.jsp?language=en), the Italian National Cultural aggregator. Each of the description is labeled either HIGH or LOW quality, according its adherence to the standard cataloguing guidelines provided by Istituto Centrale per il Catalogo e la Documentazione (ICCD). More precisely, each description is labeled as HIGH quality if the object and subject of the item (for which the description is provided) are both described according to the ICCD guidelines, and as LOW quality in all other cases. Most of the dataset was manually annotated, with ~30K descriptions automatically labeled as LOW quality due to their length (less than 3 tokens) or their provenance from old (pre-2012), not curated, collections. The dataset was developed to support the training and testing of ML text classification approaches for automatically assessing the quality of textual descriptions in digital Cultural Heritage repositories.The dataset is provided as a CSV file, where each row corresponds to an item from Cultura Italia, and contains the textual description of the item, the domain of the item (OpereArteVisiva/RepertoArcheologico/Architettura) and the quality label (Low_Quality/High_Quality).The textual descriptions in the dataset are provided by Cultura Italia with a "Public Domain" license (c.f., http://www.culturaitalia.it/opencms/export/sites/culturaitalia/attachments/linked_open_data/Licenza_CulturaItalia_CC0.pdf). The whole dataset, including the annotation, is openly distributed according to the Creative Commons Attribution-ShareAlike 4.0 Generic (CC BY-SA 4.0) licence.

  14. License Plates Object Detection Dataset - Original License Plates

    • public.roboflow.com
    zip
    Updated Oct 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2022). License Plates Object Detection Dataset - Original License Plates [Dataset]. https://public.roboflow.com/object-detection/license-plates-us-eu/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 15, 2022
    Dataset provided by
    Roboflow, Inc.
    Authors
    Roboflow
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of Plates
    Description

    Overview

    The License Plates dataset is a object detection dataset of different vehicles (i.e. cars, vans, etc.) and their respective license plate. Annotations also include examples of "vehicle" and "license-plate". This dataset has a train/validation/test split of 245/70/35 respectively. https://i.imgur.com/JmRgjBq.png" alt="Dataset Example">

    Use Cases

    This dataset could be used to create a vehicle and license plate detection object detection model. Roboflow provides a great guide on creating a license plate and vehicle object detection model.

    Using this Dataset

    This dataset is a subset of the Open Images Dataset. The annotations are licensed by Google LLC under CC BY 4.0 license. Some annotations have been combined or removed using Roboflow's annotation management tools to better align the annotations with the purpose of the dataset. The images have a CC BY 2.0 license.

    About Roboflow

    Roboflow creates tools that make computer vision easy to use for any developer, even if you're not a machine learning expert. You can use it to organize, label, inspect, convert, and export your image datasets. And even to train and deploy computer vision models with no code required. https://i.imgur.com/WHFqYSJ.png" alt="https://roboflow.com">

  15. Dark-field Microscopy Dataset and SVM Code for Agave-Based Mezcal...

    • zenodo.org
    bin, txt, zip
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel G Ramirez-Elias; Miguel G Ramirez-Elias; Juan Carlos Torres-Galván; Juan Carlos Torres-Galván; Pedro E. Ramirez-Gonzalez; Pedro E. Ramirez-Gonzalez; Luis Adrián Langarica; Edgar Guevara; Edgar Guevara; Luis Adrián Langarica (2025). Dark-field Microscopy Dataset and SVM Code for Agave-Based Mezcal Classification [Dataset]. http://doi.org/10.5281/zenodo.15810264
    Explore at:
    txt, zip, binAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Miguel G Ramirez-Elias; Miguel G Ramirez-Elias; Juan Carlos Torres-Galván; Juan Carlos Torres-Galván; Pedro E. Ramirez-Gonzalez; Pedro E. Ramirez-Gonzalez; Luis Adrián Langarica; Edgar Guevara; Edgar Guevara; Luis Adrián Langarica
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 4, 2025
    Description

    Dataset Title:
    Dark-Field Microscopy Images of Evaporated Mezcal Droplets for Agave Species Classification

    Description:
    This dataset contains dark-field microscopy images of mezcal samples produced from four agave species: Agave salmiana (salmiana), Agave marmorata (tepeztate), Agave rhodacantha (cuishe), and Agave angustifolia (espadin), as well as an aged salmiana. Each 1 μL droplet of diluted mezcal (20% ABV) was deposited on a cleaned glass slide and allowed to evaporate under ambient conditions to form distinct microstructures. The resulting images were acquired at 4× magnification and used to train and validate a Support Vector Machine (SVM) classifier to distinguish between the first two varietals. The dataset supports research in agave-based spirit authentication, chemometric image analysis, and low-cost classification of artisanal products.

    Contents:

    • JPEG or PNG image files organized by class (/salmiana/, /tepeztate/, /espadin/, /cuishe/,and /tepeztate/)
    • Python scripts and Jupyter Notebooks for training, evaluation, and model export

    • Pretrained SVM model and label encoder files

    Format:
    Images (224×224 pixels), Notebooks (.ipynb)

    Intended Use:
    Research in chemometrics, machine learning, and food authentication. May also serve as a benchmark dataset for image-based classification of fermented or distilled products.

    License:
    Creative Commons Attribution 4.0 International (CC BY 4.0)

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Factori, Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-web-data-machine-learning-d-factori

Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day

Explore at:
.csvAvailable download formats
Dataset authored and provided by
Factori
Area covered
United Kingdom
Description

Factori's AI & ML training data is thoroughly tested and reviewed to ensure that what you receive on your end is of the best quality.

Integrate the comprehensive AI & ML training data provided by Grepsr and develop a superior AI & ML model.

Whether you're training algorithms for natural language processing, sentiment analysis, or any other AI application, we can deliver comprehensive datasets tailored to fuel your machine learning initiatives.

Enhanced Data Quality: We have rigorous data validation processes and also conduct quality assurance checks to guarantee the integrity and reliability of the training data for you to develop the AI & ML models.

Gain a competitive edge, drive innovation, and unlock new opportunities by leveraging the power of tailored Artificial Intelligence and Machine Learning training data with Factori.

We offer web activity data of users that are browsing popular websites around the world. This data can be used to analyze web behavior across the web and build highly accurate audience segments based on web activity for targeting ads based on interest categories and search/browsing intent.

Web Data Reach: Our reach data represents the total number of data counts available within various categories and comprises attributes such as Country, Anonymous ID, IP addresses, Search Query, and so on.

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).

Data Attributes: Anonymous_id IDType Timestamp Estid Ip userAgent browserFamily deviceType Os Url_metadata_canonical_url Url_metadata_raw_query_params refDomain mappedEvent Channel searchQuery Ttd_id Adnxs_id Keywords Categories Entities Concepts

Search
Clear search
Close search
Google apps
Main menu