12 datasets found
  1. d

    Factori AI & ML Training Data | People Data | USA | Machine Learning Data

    • datarade.ai
    .json, .csv
    Updated Jul 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Factori (2022). Factori AI & ML Training Data | People Data | USA | Machine Learning Data [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-consumer-data-usa-machine-factori
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Jul 23, 2022
    Dataset authored and provided by
    Factori
    Area covered
    United States of America
    Description

    Our People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

    Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences.

    1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc.
    2. Demographics - Gender, Age Group, Marital Status, Language etc.
    3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc
    4. Persona - Consumer type, Communication preferences, Family type, etc
    5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc.
    6. Household - Number of Children, Number of Adults, IP Address, etc.
    7. Behaviours - Brand Affinity, App Usage, Web Browsing etc.
    8. Firmographics - Industry, Company, Occupation, Revenue, etc
    9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc.
    10. Auto - Car Make, Model, Type, Year, etc.
    11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

    People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

    Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

    People data Use Cases:

    360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

    Here's the schema of People Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
    credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
    mortgage_loan2_type mortgage_lender_code
    mortgage_loan2_render_code
    mortgage_lender mortgage_loan2_lender
    mortgage_loan2_ratetype mortgage_rate
    mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
    office_census_block_group
    office_census_tract office_county_code
    company_phone
    company_credit_score
    company_csa_code
    company_dpbc
    company_franchiseflag
    company_facebookurl company_linkedinurl company_twitterurl
    company_website company_fortune_rank
    company_government_type company_headquarters_branch company_home_business
    company_industry
    company_num_pcs_used
    company_num_employees
    company_firm_individual company_msa company_msa_name
    company_naics_code
    company_naics_description
    company_naics_code2 company_naics_description2
    company_sic_code2
    company_sic_code2_description
    company_sic_code4 company_sic_code4_description
    company_sic_code6
    company_sic_code6_description
    company_sic_code8
    company_sic_code8_description company_parent_company
    company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
    company_revenue company_sales_volume
    company_small_business company_stock_ticker company_year_founded company_minorityowned
    company_female_owned_or_operated company_franchise_code company_dma company_dma_name
    company_hq_address
    company_hq_city company_hq_duns company_hq_state
    company_hq_zip5 company_hq_zip4 company_se...

  2. d

    Factori AI & ML Training Data | Mobility Data | Global | Machine Learning...

    • datarade.ai
    .csv
    Updated Dec 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Factori (2019). Factori AI & ML Training Data | Mobility Data | Global | Machine Learning Data | Carrier, IP address, Hex8, Hex9 | Historical Location data [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-mobility-data-global-mac-factori
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Dec 12, 2019
    Dataset authored and provided by
    Factori
    Area covered
    Costa Rica, Greenland, United States Minor Outlying Islands, Togo, Italy, Tokelau, Grenada, Fiji, Burundi, Virgin Islands (U.S.)
    Description

    Mobility/Location data is gathered from location-aware mobile apps using an SDK-based implementation. All users explicitly consent to allow location data sharing using a clear opt-in process for our use cases and are given clear opt-out options. Factori ingests, cleans, validates, and exports all location data signals to ensure only the highest quality of data is made available for analysis.

    Record Count:90 Billion+ Capturing Frequency: Once per Event Delivering Frequency: Once per Day Updated: Daily

    Mobility Data Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings.

    Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited interval (daily/weekly/monthly/quarterly).

    Use Cases: Consumer Insight: Gain a comprehensive 360-degree perspective of the customer to spot behavioral changes, analyze trends and predict business outcomes. Market Intelligence: Study various market areas, the proximity of points or interests, and the competitive landscape. Advertising: Create campaigns and customize your messaging depending on your target audience's online and offline activity. Retail Analytics Analyze footfall trends in various locations and gain understanding of customer personas.

    Here's the data attributes: maid latitude longtitude horizontal_accuracy timestamp id_type ipv4 ipv6 user_agent country state_hasc city_hasc postcode geohash hex8 hex9 carrier

  3. R

    Car Highway Dataset

    • universe.roboflow.com
    zip
    Updated Sep 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sallar (2023). Car Highway Dataset [Dataset]. https://universe.roboflow.com/sallar/car-highway/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 13, 2023
    Dataset authored and provided by
    Sallar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Vehicles Bounding Boxes
    Description

    Car-Highway Data Annotation Project

    Introduction

    In this project, we aim to annotate car images captured on highways. The annotated data will be used to train machine learning models for various computer vision tasks, such as object detection and classification.

    Project Goals

    • Collect a diverse dataset of car images from highway scenes.
    • Annotate the dataset to identify and label cars within each image.
    • Organize and format the annotated data for machine learning model training.

    Tools and Technologies

    For this project, we will be using Roboflow, a powerful platform for data annotation and preprocessing. Roboflow simplifies the annotation process and provides tools for data augmentation and transformation.

    Annotation Process

    1. Upload the raw car images to the Roboflow platform.
    2. Use the annotation tools in Roboflow to draw bounding boxes around each car in the images.
    3. Label each bounding box with the corresponding class (e.g., car).
    4. Review and validate the annotations for accuracy.

    Data Augmentation

    Roboflow offers data augmentation capabilities, such as rotation, flipping, and resizing. These augmentations can help improve the model's robustness.

    Data Export

    Once the data is annotated and augmented, Roboflow allows us to export the dataset in various formats suitable for training machine learning models, such as YOLO, COCO, or TensorFlow Record.

    Milestones

    1. Data Collection and Preprocessing
    2. Annotation of Car Images
    3. Data Augmentation
    4. Data Export
    5. Model Training

    Conclusion

    By completing this project, we will have a well-annotated dataset ready for training machine learning models. This dataset can be used for a wide range of applications in computer vision, including car detection and tracking on highways.

  4. Zenodo Open Metadata snapshot - Training dataset for records and communities...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, bin
    Updated Dec 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo team; Zenodo team (2022). Zenodo Open Metadata snapshot - Training dataset for records and communities classifier building [Dataset]. http://doi.org/10.5281/zenodo.7438358
    Explore at:
    bin, application/gzipAvailable download formats
    Dataset updated
    Dec 15, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zenodo team; Zenodo team
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains Zenodo's published open access records and communities metadata, including entries marked by the Zenodo staff as spam and deleted.

    The datasets are gzipped compressed JSON-lines files, where each line is a JSON object representation of a Zenodo record or community.

    Records dataset

    Filename: zenodo_open_metadata_{ date of export }.jsonl.gz

    Each object contains the terms: part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date

    which correspond to the fields with the same name available in Zenodo's record JSON Schema at https://zenodo.org/schemas/records/record-v1.0.0.json.

    In addition, some terms have been altered:

    • The term files contains a list of dictionaries containing filetype, size, and filename only.
    • The term license contains a short Zenodo ID of the license (e.g. "cc-by").

    Communities dataset

    Filename: zenodo_community_metadata_{ date of export }.jsonl.gz

    Each object contains the terms: id, title, description, curation_policy, page

    which correspond to the fields with the same name available in Zenodo's community creation form.

    Notes for all datasets

    For each object the term spam contains a boolean value, determining whether a given record/community was marked as spam content by Zenodo staff.

    Some values for the top-level terms, which were missing in the metadata may contain a null value.

    A smaller uncompressed random sample of 200 JSON lines is also included for each dataset to test and get familiar with the format without having to download the entire dataset.

  5. Popnet: computer vision based bespoken deep learning model for forecasting...

    • figshare.com
    zip
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Byeonghwa Jeong; Bokyeong Lee (2025). Popnet: computer vision based bespoken deep learning model for forecasting gridded population [Dataset]. http://doi.org/10.6084/m9.figshare.25959652.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 27, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Byeonghwa Jeong; Bokyeong Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study introduces Popnet, a deep learning model for forecasting 1km-gridded populations, integrating U-Net, ConvLSTM, a Spatial Autocorrelation module and deep ensemble methods. Using spatial variables and population data from 2000 to 2020, Popnet predicts South Korea's population trends by age groups (under 14, 15-64, over 65) up to 2040. In validation, it outperforms traditional machine learning and state-of-the-art computer vision models. The output of this model discovered significant polarization: population growth in urban areas, especially the capital region, and severe depopulation in rural areas. Popnet is a robust tool for offering significant insights to policymakers and related stakeholders about the detailed future population, which allows them to establish detailed, localised planning and resource allocations.*Due to the export restrictions on grid data imposed by the National Geographic Information Institute of Korea, the training data has been replaced with data from Tennessee. However, the Korean version of the future prediction data remains unchanged. Please take this into consideration.

  6. EEG Muse2 Motor imagery brain electrical activity

    • kaggle.com
    zip
    Updated Jul 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ScorpioDagger (2024). EEG Muse2 Motor imagery brain electrical activity [Dataset]. https://www.kaggle.com/datasets/muhammadatefelkaffas/eeg-muse2-motor-imagery-brain-electrical-activity/code
    Explore at:
    zip(177883265 bytes)Available download formats
    Dataset updated
    Jul 2, 2024
    Authors
    ScorpioDagger
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset was collected by me as a means for training machine / deep learning models in an EEG motor imagery classification. This was one of my roles as a machine learning engineer for the graduation project in Faculty of Artificial Intelligence, KafrElsheikh university. I am profoundly grateful to all the technical support and advice provided by the project's supervisor, Dr. Mona AlNaggar

    The goal is to perform motor imagery classification (left, right, relaxed) and translate JUST thoughts into action.

    I used Muse2 Headband with 4 electrodes and Muse Monitor android app to start the recording sessions, and export the CSVs.

    This high dimensional, temporal-data was collected in both subject-independent and subject-dependent contexts with the help of 19 healthy subjects (12 males, 7 females) in different states aged between 19 and 68 as a means of training for various deterministic and non-deterministic machine learning models to carry out Motor imagery classification task. 20 columns, where we have 5 powerbands (Alpha, beta, theta, delta, gamma) per each of the 4 sensor-electrodes, were of significance to the motor imagery classification. I didn't use the raw data. However, raw data is exported via muse monitor too so you can use it as more insights can be extracted out of the raw data and thus use 4 columns only (AF7, AF8, TP9, TP10). Features like gyro, accelerometer weren't an area of interest for this EEG brain analysis or motor imagery classification. Feature engineering techniques like PCA, ICA can be beneficial especially for the raw data scenario.

    Motor Imagery is one class of the event-related potentials. It is the imagination of motion, without doing any actual movement.

    For the elements column, there are instances of blink, Marker 1, 2, 3. An Important assumption to mention about the data markers is that :

    Marker 1 -> left motor imagery

    Marker 2 -> right motor imagery

    Marker 3 -> Relaxed state (which is an intermediate phase between right and left in the conducted motor imagery experiments for 19 subjects)

    Data was later split into training, testing and validation portions. The experiment involves various steps, fitting the muse2 headband to the subject's head, the experiment conductor carefully looking over Mind Monitor android app and starts a recording session to collect the data, objects are present on the right and left of the experiment subject. Usually, relaxing state is maintained at first then left imagery then relax then right imagery and so on.

    Before heading onto the experiment, subjects were trained upon the official Muse app, meditative sessions so we are sure of their ability to maintain their focus especially in the intermediate step between left and right imagery.

    Experiment setup in comfortable and stable lightened places :- The volunteer begins by sitting down on a comfortable chair, their arms parallel and resting on a table. Prior to this, they are trained in meditation using the Muse app to help them achieve a relaxed state. Two cups of water are placed on either side of the participant, each 5 cm away from a hand and within their line of sight. The volunteer then sits comfortably, raises their chin slightly, and keeps their head steady to avoid noise in the EEG data. Their eyes rotate to the left or right, looking over the cup without moving their head.

    The experiment conductor then instructs the participant to engage in motor imagery, imagining picking up the cup on their left with their left hand and drinking, but without any actual hand movement. This state of focus is maintained for a duration of 0.5, 1, 2, or 3 minutes, depending on the subject’s attention span. The same process is repeated for the cup on the right with the right hand.

    If the experiment conductor notices a decline in the concentration level, indicating a lower attention span, they ask the volunteer to concentrate more on the motor imagery and apply a visual stimulus to the cup. These steps are repeated several times, ranging from 2 to 3, to capture clean and accurate EEG data while the volunteer is engaged in motor imagery tasks. It’s a careful balance of physical stillness and mental activity.

    Potential applications for the motor imagery classification: Translating thoughts into actions for controlling a game, helping the handicapped control their surroundings especially in a smart home environment.

    Obviously, the performance metric for such motor imagery task would be the F1 score which is harmonic mean between precision and recall.

  7. Demo datasets for PhenoLearn

    • zenodo.org
    zip
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yichen He; Yichen He (2025). Demo datasets for PhenoLearn [Dataset]. http://doi.org/10.5281/zenodo.8152784
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yichen He; Yichen He
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Zenodo record contains two test datasets (Birds and Littorina) used in the paper:

    PhenoLearn: A user-friendly Toolkit for Image Annotation and Deep Learning-Based Phenotyping for Biological Datasets

    Authors: Yichen He, Christopher R. Cooney, Steve Maddock, Gavin H. Thomas

    PhenoLearn is a graphical and script-based toolkit designed to help biologists annotate and analyse biological images using deep learning. This dataset includes two test cases: one of bird specimen images for semantic segmentation, and another of marine snail (Littorina) images for landmark detection. These datasets are used to demonstrate the PhenoLearn workflow in the accompanying paper.

    Dataset Structure

    Bird Dataset

    • train/ — 120 bird specimen images for annotation and model training.
    • pred/ — 100 images for prediction and testing.
    • seg_train.csv — Pixel-wise segmentations (CSV format with RLE or polygon masks).
    • name_file_pred — Filenames corresponding to prediction images.

    Littorina Dataset

    • train/ — 120 snail images for training landmark prediction models.
    • pred/ — 100 snail images for model testing.
    • pts_train.csv — Ground-truth landmark coordinates for training images.
    • name_file_pred — Prediction image filenames for evaluation.

    How to Use These Datasets

    Workflow Instructions (via PhenoLearn)

    1. Download the dataset folders.

    2. Use PhenoLearn to load seg_train.csv (segmentation) or pts_train.csv (landmark) to view and edit annotations.

    3. Train segmentation or landmark prediction models directly via PhenoLearn's training module, or export data for external tools.

    4. Use name_file_pred to match predictions with ground-truth for evaluation.

    See the full tutorial and usage guide in the https://github.com/EchanHe/PhenoLearn.

  8. Z

    replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Apr 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Plum, Fabian; Bulla, René; Beck, Hendrik; Imirzian, Natalie; Labonte, David (2023). replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7849416
    Explore at:
    Dataset updated
    Apr 21, 2023
    Dataset provided by
    Imperial College London
    The Pocket Dimension, Munich
    Authors
    Plum, Fabian; Bulla, René; Beck, Hendrik; Imirzian, Natalie; Labonte, David
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all recorded and hand-annotated as well as all synthetically generated data as well as representative trained networks used for detection and tracking experiments in the replicAnt - generating annotated images of animals in complex environments using Unreal Engine manuscript. Unless stated otherwise, all 3D animal models used in the synthetically generated data have been generated with the open-source photgrammetry platform scAnt peerj.com/articles/11155/. All synthetic data has been generated with the associated replicAnt project available from https://github.com/evo-biomech/replicAnt.

    Abstract:

    Deep learning-based computer vision methods are transforming animal behavioural research. Transfer learning has enabled work in non-model species, but still requires hand-annotation of example footage, and is only performant in well-defined conditions. To overcome these limitations, we created replicAnt, a configurable pipeline implemented in Unreal Engine 5 and Python, designed to generate large and variable training datasets on consumer-grade hardware instead. replicAnt places 3D animal models into complex, procedurally generated environments, from which automatically annotated images can be exported. We demonstrate that synthetic data generated with replicAnt can significantly reduce the hand-annotation required to achieve benchmark performance in common applications such as animal detection, tracking, pose-estimation, and semantic segmentation; and that it increases the subject-specificity and domain-invariance of the trained networks, so conferring robustness. In some applications, replicAnt may even remove the need for hand-annotation altogether. It thus represents a significant step towards porting deep learning-based computer vision tools to the field.

    Benchmark data

    Two video datasets were curated to quantify detection performance; one in laboratory and one in field conditions. The laboratory dataset consists of top-down recordings of foraging trails of Atta vollenweideri (Forel 1893) leaf-cutter ants. The colony was collected in Uruguay in 2014, and housed in a climate chamber at 25°C and 60% humidity. A recording box was built from clear acrylic, and placed between the colony nest and a box external to the climate chamber, which functioned as feeding site. Bramble leaves were placed in the feeding area prior to each recording session, and ants had access to the recording area at will. The recorded area was 104 mm wide and 200 mm long. An OAK-D camera (OpenCV AI Kit: OAK-D, Luxonis Holding Corporation) was positioned centrally 195 mm above the ground. While keeping the camera position constant, lighting, exposure, and background conditions were varied to create recordings with variable appearance: The “base” case is an evenly lit and well exposed scene with scattered leaf fragments on an otherwise plain white backdrop. A “bright” and “dark” case are characterised by systematic over- or underexposure, respectively, which introduces motion blur, colour-clipped appendages, and extensive flickering and compression artefacts. In a separate well exposed recording, the clear acrylic backdrop was substituted with a printout of a highly textured forest ground to create a “noisy” case. Last, we decreased the camera distance to 100 mm at constant focal distance, effectively doubling the magnification, and yielding a “close” case, distinguished by out-of-focus workers. All recordings were captured at 25 frames per second (fps).

    The field datasets consists of video recordings of Gnathamitermes sp. desert termites, filmed close to the nest entrance in the desert of Maricopa County, Arizona, using a Nikon D850 and a Nikkor 18-105 mm lens on a tripod at camera distances between 20 cm to 40 cm. All video recordings were well exposed, and captured at 23.976 fps.

    Each video was trimmed to the first 1000 frames, and contains between 36 and 103 individuals. In total, 5000 and 1000 frames were hand-annotated for the laboratory- and field-dataset, respectively: each visible individual was assigned a constant size bounding box, with a centre coinciding approximately with the geometric centre of the thorax in top-down view. The size of the bounding boxes was chosen such that they were large enough to completely enclose the largest individuals, and was automatically adjusted near the image borders. A custom-written Blender Add-on aided hand-annotation: the Add-on is a semi-automated multi animal tracker, which leverages blender’s internal contrast-based motion tracker, but also include track refinement options, and CSV export functionality. Comprehensive documentation of this tool and Jupyter notebooks for track visualisation and benchmarking is provided on the replicAnt and BlenderMotionExport GitHub repositories.

    Synthetic data generation

    Two synthetic datasets, each with a population size of 100, were generated from 3D models of \textit{Atta vollenweideri} leaf-cutter ants. All 3D models were created with the scAnt photogrammetry workflow. A “group” population was based on three distinct 3D models of an ant minor (1.1 mg), a media (9.8 mg), and a major (50.1 mg) (see 10.5281/zenodo.7849059)). To approximately simulate the size distribution of A. vollenweideri colonies, these models make up 20%, 60%, and 20% of the simulated population, respectively. A 33% within-class scale variation, with default hue, contrast, and brightness subject material variation, was used. A “single” population was generated using the major model only, with 90% scale variation, but equal material variation settings.

    A Gnathamitermes sp. synthetic dataset was generated from two hand-sculpted models; a worker and a soldier made up 80% and 20% of the simulated population of 100 individuals, respectively with default hue, contrast, and brightness subject material variation. Both 3D models were created in Blender v3.1, using reference photographs.

    Each of the three synthetic datasets contains 10,000 images, rendered at a resolution of 1024 by 1024 px, using the default generator settings as documented in the Generator_example level file (see documentation on GitHub). To assess how the training dataset size affects performance, we trained networks on 100 (“small”), 1,000 (“medium”), and 10,000 (“large”) subsets of the “group” dataset. Generating 10,000 samples at the specified resolution took approximately 10 hours per dataset on a consumer-grade laptop (6 Core 4 GHz CPU, 16 GB RAM, RTX 2070 Super).

    Additionally, five datasets which contain both real and synthetic images were curated. These “mixed” datasets combine image samples from the synthetic “group” dataset with image samples from the real “base” case. The ratio between real and synthetic images across the five datasets varied between 10/1 to 1/100.

    Funding

    This study received funding from Imperial College’s President’s PhD Scholarship (to Fabian Plum), and is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement No. 851705, to David Labonte). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  9. R

    Urcdd Dataset

    • universe.roboflow.com
    zip
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xeon (2023). Urcdd Dataset [Dataset]. https://universe.roboflow.com/xeon/urcdd/model/4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 11, 2023
    Dataset authored and provided by
    xeon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Roblox Characters Bounding Boxes
    Description

    URCDD (Unversal Roblox Character Detection Dataset)

    Welcome to the Universal Roblox Character Detection Dataset (URCDD). This dataset is a comprehensive collection of images extracted from various games on the Roblox platform. Our primary objective is to offer a diverse and extensive dataset that encompasses the wide array of characters found in Roblox games.

    Key Features

    1. Diverse Game Genres: URCDD includes character images from a myriad of game genres on Roblox, ranging from first-person shooters to roleplaying games.
    2. Comprehensive Coverage: Our dataset aims to provide a complete representation of the different character types encountered across the Roblox gaming ecosystem.
    3. Facilitating Model Training: URCDD is designed to simplify the training process for machine learning models focused on character detection within the Roblox environment.

    Dataset Details

    • Size: The dataset comprises a substantial number of images collected from hundreds of Roblox games. The version specific image count can be viewed in the Versions tab.
    • Format: Images are provided in a standardized format for ease of use in machine learning applications. You can export this dataset in any format you desire, thanks to Roboflow.
    • Annotations: This dataset currently only has one class that is deticated to identifying a player's character. We are looking to add more classes in the future.

    Tags

    We have created a unique tag for each game that we have collected data from. Refer to the list below: 1. baseplate - https://www.roblox.com/games/4483381587 2. da-hood - https://www.roblox.com/games/2788229376 3. arsenal - https://www.roblox.com/games/286090429 4. aimblox - https://www.roblox.com/games/6808416928 5. hood-customs - https://www.roblox.com/games/9825515356 6. counter-blox - https://www.roblox.com/games/301549746/ 7. hood-testing - https://www.roblox.com/games/12673840215 8. phantom-forces - https://www.roblox.com/games/292439477 9. entrenched - https://www.roblox.com/games/3678761576

  10. License Plates Object Detection Dataset - Original License Plates

    • public.roboflow.com
    zip
    Updated Oct 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2022). License Plates Object Detection Dataset - Original License Plates [Dataset]. https://public.roboflow.com/object-detection/license-plates-us-eu/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 15, 2022
    Dataset authored and provided by
    Roboflowhttps://roboflow.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of Plates
    Description

    Overview

    The License Plates dataset is a object detection dataset of different vehicles (i.e. cars, vans, etc.) and their respective license plate. Annotations also include examples of "vehicle" and "license-plate". This dataset has a train/validation/test split of 245/70/35 respectively. https://i.imgur.com/JmRgjBq.png" alt="Dataset Example">

    Use Cases

    This dataset could be used to create a vehicle and license plate detection object detection model. Roboflow provides a great guide on creating a license plate and vehicle object detection model.

    Using this Dataset

    This dataset is a subset of the Open Images Dataset. The annotations are licensed by Google LLC under CC BY 4.0 license. Some annotations have been combined or removed using Roboflow's annotation management tools to better align the annotations with the purpose of the dataset. The images have a CC BY 2.0 license.

    About Roboflow

    Roboflow creates tools that make computer vision easy to use for any developer, even if you're not a machine learning expert. You can use it to organize, label, inspect, convert, and export your image datasets. And even to train and deploy computer vision models with no code required. https://i.imgur.com/WHFqYSJ.png" alt="https://roboflow.com">

  11. QuasarNet

    • kaggle.com
    zip
    Updated Mar 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sunnytang (2021). QuasarNet [Dataset]. https://www.kaggle.com/datasets/hisunnytang/halo-sims/discussion?sort=undefined
    Explore at:
    zip(3042045597 bytes)Available download formats
    Dataset updated
    Mar 24, 2021
    Authors
    sunnytang
    Description

    Abstract

    We present QuasarNet a novel research platform that enables deployment of data-driven modeling techniques for the investigation of the properties of super-massive black holes. Black hole data sets — observations and simulations — have grown rapidly in the last decade in both complexity and abundance. However, our computational environments and tool sets have not matured commensurately to exhaust opportunities for discovery with these observational and simulated data. Our pilot study presented here is motivated by one of the fundamental open questions that is the nature of the quasar and host halo/galaxy connection. To explore this, we co-locate large, multi-wavelength observational data sets of the high-redshift luminous population of accreting black holes at z > 3 alongside simulated data spanning the same cosmic epochs in QuasarNetW˙ e demonstrate that the properties of observed quasars as well as their putative dark matter host halos can be extracted for studying their association and correspondence. In this paper, we describe the design, implementation, and operation of the publicly queryable QuasarNet database and provide examples of query types and visualizations that can be used to explore the data. Starting with data collated in QuasarNet which will serve as training sets, we plan to utilize machine learning algorithms to predict properties of the as yet undetected, less luminous quasar population. To that ultimate goal, here we present newly developed tools that permit extracting relevant quantities for future analysis. All codes and the data itself are available for downloading from this site.

    Content

    https://drive.google.com/uc?export=view&id=1F3PfdOueRqS3_ARt0RuAkIq4jUOLKyYE" alt="">

    The database contains primarily the simulation datasets, and observational datasets.

    Acknowledgements

    PN gratefully acknowledges the invitation to Google's Science Festival SciFoo in 2018, where she first hatched this idea and thanks Sanjay Sarma and Brian Subirana at MIT for early discussions. She acknowledges Alphabet-X for technical support and computational resources for this project. KST thanks Frank Wang at Google for his help with the Google Cloud Platform, and Rick Ebert at the Infra-Red Processing and Analysis Center (IPAC) at the California Institute of Technology for his help with accessing the NED database. SK acknowledges use of the ARCHER UK National Super-computing Service (http://www.archer.ac.uk) for running the LEGACY simulation. BN acknowledges support from the Fermi National Accelerator Laboratory, managed and operated by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. Government purposes. SS acknowledges the Aspen Center for Physics where parts of this work were done, which is supported by National Science Foundation grant PHY-1607611.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  12. Annotated dataset to assess the accuracy of the textual description of...

    • figshare.com
    txt
    Updated Dec 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matteo Lorenzini; Marco Rospocher; Sara Tonelli (2020). Annotated dataset to assess the accuracy of the textual description of cultural heritage records [Dataset]. http://doi.org/10.6084/m9.figshare.13359104.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 19, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Matteo Lorenzini; Marco Rospocher; Sara Tonelli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains more than 100K textual descriptions of cultural items from Cultura Italia (http://www.culturaitalia.it/opencms/index.jsp?language=en), the Italian National Cultural aggregator. Each of the description is labeled either HIGH or LOW quality, according its adherence to the standard cataloguing guidelines provided by Istituto Centrale per il Catalogo e la Documentazione (ICCD). More precisely, each description is labeled as HIGH quality if the object and subject of the item (for which the description is provided) are both described according to the ICCD guidelines, and as LOW quality in all other cases. Most of the dataset was manually annotated, with ~30K descriptions automatically labeled as LOW quality due to their length (less than 3 tokens) or their provenance from old (pre-2012), not curated, collections. The dataset was developed to support the training and testing of ML text classification approaches for automatically assessing the quality of textual descriptions in digital Cultural Heritage repositories.The dataset is provided as a CSV file, where each row corresponds to an item from Cultura Italia, and contains the textual description of the item, the domain of the item (OpereArteVisiva/RepertoArcheologico/Architettura) and the quality label (Low_Quality/High_Quality).The textual descriptions in the dataset are provided by Cultura Italia with a "Public Domain" license (c.f., http://www.culturaitalia.it/opencms/export/sites/culturaitalia/attachments/linked_open_data/Licenza_CulturaItalia_CC0.pdf). The whole dataset, including the annotation, is openly distributed according to the Creative Commons Attribution-ShareAlike 4.0 Generic (CC BY-SA 4.0) licence.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Factori (2022). Factori AI & ML Training Data | People Data | USA | Machine Learning Data [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-consumer-data-usa-machine-factori

Factori AI & ML Training Data | People Data | USA | Machine Learning Data

Explore at:
.json, .csvAvailable download formats
Dataset updated
Jul 23, 2022
Dataset authored and provided by
Factori
Area covered
United States of America
Description

Our People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences.

  1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc.
  2. Demographics - Gender, Age Group, Marital Status, Language etc.
  3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc
  4. Persona - Consumer type, Communication preferences, Family type, etc
  5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc.
  6. Household - Number of Children, Number of Adults, IP Address, etc.
  7. Behaviours - Brand Affinity, App Usage, Web Browsing etc.
  8. Firmographics - Industry, Company, Occupation, Revenue, etc
  9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc.
  10. Auto - Car Make, Model, Type, Year, etc.
  11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

People data Use Cases:

360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

Here's the schema of People Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
mortgage_loan2_type mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender mortgage_loan2_lender
mortgage_loan2_ratetype mortgage_rate
mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
office_census_block_group
office_census_tract office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl company_linkedinurl company_twitterurl
company_website company_fortune_rank
company_government_type company_headquarters_branch company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual company_msa company_msa_name
company_naics_code
company_naics_description
company_naics_code2 company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4 company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description company_parent_company
company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
company_revenue company_sales_volume
company_small_business company_stock_ticker company_year_founded company_minorityowned
company_female_owned_or_operated company_franchise_code company_dma company_dma_name
company_hq_address
company_hq_city company_hq_duns company_hq_state
company_hq_zip5 company_hq_zip4 company_se...

Search
Clear search
Close search
Google apps
Main menu