100+ datasets found
  1. h

    example-space-to-dataset-json

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucain Pouget, example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json
    Explore at:
    Authors
    Lucain Pouget
    Description
  2. h

    example-space-to-dataset-json

    • huggingface.co
    Updated May 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    m (2025). example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/mmwmm/example-space-to-dataset-json
    Explore at:
    Dataset updated
    May 26, 2025
    Authors
    m
    Description

    mmwmm/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. json_large_sample

    • kaggle.com
    zip
    Updated Dec 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noura Aly (2023). json_large_sample [Dataset]. https://www.kaggle.com/datasets/nouraaly/json-large-sample
    Explore at:
    zip(55508 bytes)Available download formats
    Dataset updated
    Dec 1, 2023
    Authors
    Noura Aly
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Noura Aly

    Released under Apache 2.0

    Contents

  4. Store Sales json

    • kaggle.com
    zip
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Indi Ella (2024). Store Sales json [Dataset]. https://www.kaggle.com/datasets/indiella/store-sales-json
    Explore at:
    zip(5397153 bytes)Available download formats
    Dataset updated
    Jun 1, 2024
    Authors
    Indi Ella
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset contains more than 50000 records of Sales and order data related to an online store.

  5. Z

    #PraCegoVer dataset

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jan 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
    Explore at:
    Dataset updated
    Jan 19, 2023
    Dataset provided by
    Institute of Computing, University of Campinas
    Authors
    Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila
    Description

    Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

    PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

    PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

    Dataset Structure

    PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

    containing the images. The file dataset.json comprehends a list of json objects with the attributes:

    user: anonymized user that made the post;

    filename: image file name;

    raw_caption: raw caption;

    caption: clean caption;

    date: post date.

    Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

    Download Instructions

    If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

    cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

    Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

    python download_dataset.py --access_token=

  6. sample.json

    • kaggle.com
    zip
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hung hoang 31 (2024). sample.json [Dataset]. https://www.kaggle.com/datasets/hunghoang31/sample-json/code
    Explore at:
    zip(2442 bytes)Available download formats
    Dataset updated
    Aug 14, 2024
    Authors
    hung hoang 31
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by hung hoang 31

    Released under MIT

    Contents

  7. Z

    Valencia Portcalls 07/2018 to 12/2018

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eneko Olivares Gorriti (2020). Valencia Portcalls 07/2018 to 12/2018 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3257156
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    UPV
    Authors
    Eneko Olivares Gorriti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Json file with a list of portcalls from vessels arriving to Valencia ports. Data was used inside the INTER-IoT project as an example dataset that a legacy IoT platform was providing.

    *NOTE: Due to a bug in the system it is not possible to upload files with a .json extension. It is uploaded to ._json extension instead. Please rename it after download.

  8. json with issue examples

    • kaggle.com
    zip
    Updated Aug 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Adversarial (2025). json with issue examples [Dataset]. https://www.kaggle.com/datasets/pashadude/issue-examples/code
    Explore at:
    zip(3533 bytes)Available download formats
    Dataset updated
    Aug 24, 2025
    Authors
    Paul Adversarial
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Paul Adversarial

    Released under CC0: Public Domain

    Contents

  9. Stackoverflow post sample data. JSON format

    • kaggle.com
    zip
    Updated Apr 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeong Hoon Lee (2021). Stackoverflow post sample data. JSON format [Dataset]. https://www.kaggle.com/jeonghoonlee0ljh/stackoverflow-post-sample-data-json-format
    Explore at:
    zip(28017615 bytes)Available download formats
    Dataset updated
    Apr 16, 2021
    Authors
    Jeong Hoon Lee
    Description

    Dataset

    This dataset was created by Jeong Hoon Lee

    Contents

  10. Sample JSON

    • kaggle.com
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neal Magee (2023). Sample JSON [Dataset]. https://www.kaggle.com/datasets/nealmagee/sample-json
    Explore at:
    zip(844 bytes)Available download formats
    Dataset updated
    Jun 5, 2023
    Authors
    Neal Magee
    Description

    Dataset

    This dataset was created by Neal Magee

    Contents

  11. R

    Json >txt Dataset

    • universe.roboflow.com
    zip
    Updated Sep 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    slab (2022). Json >txt Dataset [Dataset]. https://universe.roboflow.com/slab/json-txt/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2022
    Dataset authored and provided by
    slab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Example Bounding Boxes
    Description

    Json >txt

    ## Overview
    
    Json >txt is a dataset for object detection tasks - it contains Example annotations for 296 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  12. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Bangladesh, Canada, Isle of Man, Northern Mariana Islands, Tunisia, British Indian Ocean Territory, Nepal, Andorra, Moldova (Republic of), Taiwan
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  13. groups

    • stanford.redivis.com
    • redivis.com
    Updated Sep 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LEVANTE (2025). groups [Dataset]. https://stanford.redivis.com/datasets/bm7r-cg5vx85fd
    Explore at:
    Dataset updated
    Sep 17, 2025
    Dataset provided by
    Levante UDhttp://www.levanteud.com/
    Authors
    LEVANTE
    Time period covered
    Sep 10, 2024 - Sep 25, 2025
    Description

    This upload is from levante-example-dataset/groups.json

  14. h

    example-space-to-dataset-json

    • huggingface.co
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Polok (2024). example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Lakoc/example-space-to-dataset-json
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2024
    Authors
    Alexander Polok
    Description

    Lakoc/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. f

    Main Data and Code

    • figshare.com
    zip
    Updated Oct 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Momo (2025). Main Data and Code [Dataset]. http://doi.org/10.6084/m9.figshare.29929412.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 5, 2025
    Dataset provided by
    figshare
    Authors
    Momo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Important Notice: Ethical Use OnlyThis repository provides code and datasets for academic research on misinformation.Please note that the datasets include rumor-related texts. These materials are supplied solely for scholarly analysis and research aimed at understanding and combating misinformation.Prohibited UseDo not use this repository, including its code or data, to create or spread false information in any real-world context.Any misuse of these resources for malicious purposes is strictly forbidden.DisclaimerThe authors bear no responsibility for any unethical or unlawful use of the provided resources.By accessing or using this repository, you acknowledge and agree to comply with these ethical guidelines.Project StructureThe project is organized into three main directories, each corresponding to a major section of the paper's experiments:main_data_and_code/├── rumor_generation/├── rumor_detection/└── rumor_debunking/How to Get StartedPrerequisitesTo successfully run the code and reproduce the results, you will need to:Obtain and configure your own API key for the large language models (LLMs) used in the experiments. Please replace the placeholder API key in the code with your own.For the rumor detection experiments, download the public datasets (Twitter15, Twitter16, FakeNewsNet) from their respective sources. The pre-process scripts in the rumor detection folder must be run first to prepare the public datasets.Please note that many scripts are provided as examples using the Twitter15 dataset. To run experiments on other datasets like Twitter16 or FakeNewsNet, you will need to modify these scripts or create copies and update the corresponding file paths.Detailed Directory Breakdown1. rumor_generation/This directory contains all the code and data related to the rumor generation experiments.rumor_generation_zeroshot.py: Code for the zero-shot rumor generation experiment.rumor_generation_fewshot.py: Code for the few-shot rumor generation experiment.rumor_generation_cot.py: Code for the chain-of-thought (CoT) rumor generation experiment.token_distribution.py: Script to analyze token distribution in the generated text.label_rumors.py:Script to label LLM-generated texts based on whether they contain rumor-related content.extract_reasons.py: Script to extract reasons for rumor generation and rejection.visualization.py: Utility script for generating figures.LDA.py: Code for performing LDA topic modeling on the generated data.rumor_generation_responses.json: The complete output dataset from the rumor generation experiments.generation_reasons_extracted.json: The extracted reasons for generated rumors.rejection_reasons_extracted.json: The extracted reasons for rejected rumor generation requests.2. rumor_detection/This directory contains the code and data used for the rumor detection experiments.nonreasoning_zeroshot_twitter15.py: Code for the non-reasoning, zero-shot detection on the Twitter15 dataset. To run on Twitter16 or FakeNewsNet, update the file paths within the script. Similar experiment scripts below follow the same principle and are not described repeatedly.nonreasoning_fewshot_twitter15.py: Code for the non-reasoning, few-shot detection on the Twitter15 dataset.nonreasoning_cot_twitter15.py: Code for the non-reasoning, CoT detection on the Twitter15 dataset.reasoning_zeroshot_twitter15.py: Code for the Reasoning LLMs, zero-shot detection on the Twitter15 dataset.reasoning_fewshot_twitter15.py: Code for the Reasoning LLMs, few-shot detection on the Twitter15 dataset.reasoning_cot_twitter15.py: Code for the Reasoning LLMs, CoT detection on the Twitter15 dataset.traditional_model.py: Code for the traditional models used as baselines.preprocess_twitter15_and_twitter16.py: Script for preprocessing the Twitter15 and Twitter16 datasets.preprocess_fakenews.py: Script for preprocessing the FakeNewsNet dataset.generate_summary_table.py: Calculates all classification metrics and generates the final summary table for the rumor detection experiments.select_few_shot_example_15.py: Script to pre-select few-shot examples, using the Twitter15 dataset as an example. To generate examples for Twitter16 or FakeNewsNet, update the file paths within the script.twitter15_few_shot_examples.json: Pre-selected few-shot examples for the Twitter15 dataset.twitter16_few_shot_examples.json: Pre-selected few-shot examples for the Twitter16 dataset.fakenewsnet_few_shot_examples.json: Pre-selected few-shot examples for the FakeNewsNet dataset.twitter15_llm_results.json: LLM prediction results on the Twitter15 dataset.twitter16_llm_results.json: LLM prediction results on the Twitter16 dataset.fakenewsnet_llm_results.json: LLM prediction results on the FakeNewsNet dataset.visualization.py: Utility script for generating figures.3. rumor_debunking/This directory contains all the code and data for the rumor debunking experiments.analyze_sentiment.py: Script for analyzing the sentiment of the debunking texts.calculate_readability.py: Script for calculating the readability score of the debunking texts.plot_readability.py: Utility script for generating figures related to readability.fact_checking_with_nli.py: Code for the NLI-based fact-checking experiment.debunking_results.json: The dataset containing the debunking results for this experimental section.debunking_results_with_readability.json: The dataset containing the debunking results along with readability scores.sentiment_analysis/: This directory contains the result file from the sentiment analysis.debunking_results_with_sentiment.json: The dataset containing the debunking results along with sentiment analysis.Please contact the repository owner if you encounter any problems or have questions about the code or data.

  16. I

    TerriaJS Map Catalog in JSON Format

    • ihp-wins.unesco.org
    • data.dev-wins.com
    json
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Rojas (2025). TerriaJS Map Catalog in JSON Format [Dataset]. https://ihp-wins.unesco.org/dataset/terriajs-map-catalog-in-json-format
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset provided by
    Pablo Rojas
    Description

    This dataset contains a collection of JSON files used to configure map catalogs in TerriaJS, an interactive geospatial data visualization platform. The files include detailed configurations for services such as WMS, WFS, and other geospatial resources, enabling the integration and visualization of diverse datasets in a user-friendly web interface. This resource is ideal for developers, researchers, and professionals who wish to customize or implement interactive map catalogs in their own applications using TerriaJS.

  17. Data from: Food Recipes dataset

    • kaggle.com
    zip
    Updated Aug 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    samsatp (2021). Food Recipes dataset [Dataset]. https://www.kaggle.com/datasets/sathianpong/foodrecipe
    Explore at:
    zip(181170342 bytes)Available download formats
    Dataset updated
    Aug 31, 2021
    Authors
    samsatp
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by samsatp

    Released under CC0: Public Domain

    Contents

  18. c

    Medium articles dataset

    • crawlfeeds.com
    • kaggle.com
    json, zip
    Updated Aug 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Medium articles dataset [Dataset]. https://crawlfeeds.com/datasets/medium-articles-dataset
    Explore at:
    json, zipAvailable download formats
    Dataset updated
    Aug 26, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Buy Medium Articles Dataset – 500K+ Published Articles in JSON Format

    Get access to a premium Medium articles dataset containing 500,000+ curated articles with metadata including author profiles, publication dates, reading time, tags, claps, and more. Ideal for natural language processing (NLP), machine learning, content trend analysis, and AI model training.

    Request here for the large dataset Medium datasets

    Checkout sample dataset in CSV

    Use Cases:

    • Training language models (LLMs)

    • Analyzing content trends and engagement

    • Sentiment and text classification

    • SEO research and author profiling

    • Academic or commercial research

    Why Choose This Dataset?

    • High-volume, cleanly structured JSON

    • Ideal for developers, researchers, and data scientists

    • Easy integration with Python, R, SQL, and other data pipelines

    • Affordable and ready-to-use

  19. R

    Val_json >txt Dataset

    • universe.roboflow.com
    zip
    Updated Sep 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    slab (2022). Val_json >txt Dataset [Dataset]. https://universe.roboflow.com/slab/val_json-txt
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2022
    Dataset authored and provided by
    slab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Example Bounding Boxes
    Description

    Val_json >txt

    ## Overview
    
    Val_json >txt is a dataset for object detection tasks - it contains Example annotations for 296 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. Z

    Data from: 3DHD CityScenes: High-Definition Maps in High-Density Point...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Plachetka, Christopher; Sertolli, Benjamin; Fricke, Jenny; Klingner, Marvin; Fingscheidt, Tim (2024). 3DHD CityScenes: High-Definition Maps in High-Density Point Clouds [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7085089
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    TU Braunschweig
    Volkswagen AG
    Authors
    Plachetka, Christopher; Sertolli, Benjamin; Fricke, Jenny; Klingner, Marvin; Fingscheidt, Tim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    3DHD CityScenes is the most comprehensive, large-scale high-definition (HD) map dataset to date, annotated in the three spatial dimensions of globally referenced, high-density LiDAR point clouds collected in urban domains. Our HD map covers 127 km of road sections of the inner city of Hamburg, Germany including 467 km of individual lanes. In total, our map comprises 266,762 individual items.

    Our corresponding paper (published at ITSC 2022) is available here. Further, we have applied 3DHD CityScenes to map deviation detection here.

    Moreover, we release code to facilitate the application of our dataset and the reproducibility of our research. Specifically, our 3DHD_DevKit comprises:

    Python tools to read, generate, and visualize the dataset,

    3DHDNet deep learning pipeline (training, inference, evaluation) for map deviation detection and 3D object detection.

    The DevKit is available here:

    https://github.com/volkswagen/3DHD_devkit.

    The dataset and DevKit have been created by Christopher Plachetka as project lead during his PhD period at Volkswagen Group, Germany.

    When using our dataset, you are welcome to cite:

    @INPROCEEDINGS{9921866, author={Plachetka, Christopher and Sertolli, Benjamin and Fricke, Jenny and Klingner, Marvin and Fingscheidt, Tim}, booktitle={2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)}, title={3DHD CityScenes: High-Definition Maps in High-Density Point Clouds}, year={2022}, pages={627-634}}

    Acknowledgements

    We thank the following interns for their exceptional contributions to our work.

    Benjamin Sertolli: Major contributions to our DevKit during his master thesis

    Niels Maier: Measurement campaign for data collection and data preparation

    The European large-scale project Hi-Drive (www.Hi-Drive.eu) supports the publication of 3DHD CityScenes and encourages the general publication of information and databases facilitating the development of automated driving technologies.

    The Dataset

    After downloading, the 3DHD_CityScenes folder provides five subdirectories, which are explained briefly in the following.

    1. Dataset

    This directory contains the training, validation, and test set definition (train.json, val.json, test.json) used in our publications. Respective files contain samples that define a geolocation and the orientation of the ego vehicle in global coordinates on the map.

    During dataset generation (done by our DevKit), samples are used to take crops from the larger point cloud. Also, map elements in reach of a sample are collected. Both modalities can then be used, e.g., as input to a neural network such as our 3DHDNet.

    To read any JSON-encoded data provided by 3DHD CityScenes in Python, you can use the following code snipped as an example.

    import json

    json_path = r"E:\3DHD_CityScenes\Dataset\train.json" with open(json_path) as jf: data = json.load(jf) print(data)

    1. HD_Map

    Map items are stored as lists of items in JSON format. In particular, we provide:

    traffic signs,

    traffic lights,

    pole-like objects,

    construction site locations,

    construction site obstacles (point-like such as cones, and line-like such as fences),

    line-shaped markings (solid, dashed, etc.),

    polygon-shaped markings (arrows, stop lines, symbols, etc.),

    lanes (ordinary and temporary),

    relations between elements (only for construction sites, e.g., sign to lane association).

    1. HD_Map_MetaData

    Our high-density point cloud used as basis for annotating the HD map is split in 648 tiles. This directory contains the geolocation for each tile as polygon on the map. You can view the respective tile definition using QGIS. Alternatively, we also provide respective polygons as lists of UTM coordinates in JSON.

    Files with the ending .dbf, .prj, .qpj, .shp, and .shx belong to the tile definition as “shape file” (commonly used in geodesy) that can be viewed using QGIS. The JSON file contains the same information provided in a different format used in our Python API.

    1. HD_PointCloud_Tiles

    The high-density point cloud tiles are provided in global UTM32N coordinates and are encoded in a proprietary binary format. The first 4 bytes (integer) encode the number of points contained in that file. Subsequently, all point cloud values are provided as arrays. First all x-values, then all y-values, and so on. Specifically, the arrays are encoded as follows.

    x-coordinates: 4 byte integer

    y-coordinates: 4 byte integer

    z-coordinates: 4 byte integer

    intensity of reflected beams: 2 byte unsigned integer

    ground classification flag: 1 byte unsigned integer

    After reading, respective values have to be unnormalized. As an example, you can use the following code snipped to read the point cloud data. For visualization, you can use the pptk package, for instance.

    import numpy as np import pptk

    file_path = r"E:\3DHD_CityScenes\HD_PointCloud_Tiles\HH_001.bin" pc_dict = {} key_list = ['x', 'y', 'z', 'intensity', 'is_ground'] type_list = ['

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lucain Pouget, example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json

example-space-to-dataset-json

Wauplin/example-space-to-dataset-json

Explore at:
Authors
Lucain Pouget
Description
Search
Clear search
Close search
Google apps
Main menu