100+ datasets found

h
example-space-to-dataset-json
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucain Pouget, example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json
Explore at:
Authors
Lucain Pouget
Description
Demo to save data from a Space to a Dataset. Goal is to provide reusable snippets of code.

Documentation: https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#scheduled-uploads Space: https://huggingface.co/spaces/Wauplin/space_to_dataset_saver/ JSON dataset: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json Image dataset: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-image Image (zipped) dataset:… See the full description on the dataset page: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json.
h
example-space-to-dataset-json
huggingface.co
Updated May 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
m (2025). example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/mmwmm/example-space-to-dataset-json
Explore at:
Dataset updated
May 26, 2025
Authors
m
Description
mmwmm/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community
json_large_sample
kaggle.com
zip
Updated Dec 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noura Aly (2023). json_large_sample [Dataset]. https://www.kaggle.com/datasets/nouraaly/json-large-sample
Explore at:
zip(55508 bytes)Available download formats
Dataset updated
Dec 1, 2023
Authors
Noura Aly
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Noura Aly

Released under Apache 2.0

Contents
Store Sales json
kaggle.com
zip
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Indi Ella (2024). Store Sales json [Dataset]. https://www.kaggle.com/datasets/indiella/store-sales-json
Explore at:
zip(5397153 bytes)Available download formats
Dataset updated
Jun 1, 2024
Authors
Indi Ella
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset contains more than 50000 records of Sales and order data related to an online store.
Z
#PraCegoVer dataset
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jan 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
Explore at:
Dataset updated
Jan 19, 2023
Dataset provided by
Institute of Computing, University of Campinas
Authors
Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila
Description
Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

Dataset Structure

PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

containing the images. The file dataset.json comprehends a list of json objects with the attributes:

user: anonymized user that made the post;

filename: image file name;

raw_caption: raw caption;

caption: clean caption;

date: post date.

Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

Download Instructions

If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

python download_dataset.py --access_token=
sample.json
kaggle.com
zip
Updated Aug 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hung hoang 31 (2024). sample.json [Dataset]. https://www.kaggle.com/datasets/hunghoang31/sample-json/code
Explore at:
zip(2442 bytes)Available download formats
Dataset updated
Aug 14, 2024
Authors
hung hoang 31
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by hung hoang 31

Released under MIT

Contents
Z
Valencia Portcalls 07/2018 to 12/2018
data.niaid.nih.gov
data.europa.eu
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eneko Olivares Gorriti (2020). Valencia Portcalls 07/2018 to 12/2018 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3257156
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
UPV
Authors
Eneko Olivares Gorriti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Json file with a list of portcalls from vessels arriving to Valencia ports. Data was used inside the INTER-IoT project as an example dataset that a legacy IoT platform was providing.

*NOTE: Due to a bug in the system it is not possible to upload files with a .json extension. It is uploaded to ._json extension instead. Please rename it after download.
json with issue examples
kaggle.com
zip
Updated Aug 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Adversarial (2025). json with issue examples [Dataset]. https://www.kaggle.com/datasets/pashadude/issue-examples/code
Explore at:
zip(3533 bytes)Available download formats
Dataset updated
Aug 24, 2025
Authors
Paul Adversarial
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Paul Adversarial

Released under CC0: Public Domain

Contents
Stackoverflow post sample data. JSON format
kaggle.com
zip
Updated Apr 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeong Hoon Lee (2021). Stackoverflow post sample data. JSON format [Dataset]. https://www.kaggle.com/jeonghoonlee0ljh/stackoverflow-post-sample-data-json-format
Explore at:
zip(28017615 bytes)Available download formats
Dataset updated
Apr 16, 2021
Authors
Jeong Hoon Lee
Description
Dataset

This dataset was created by Jeong Hoon Lee

Contents
Sample JSON
kaggle.com
zip
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neal Magee (2023). Sample JSON [Dataset]. https://www.kaggle.com/datasets/nealmagee/sample-json
Explore at:
zip(844 bytes)Available download formats
Dataset updated
Jun 5, 2023
Authors
Neal Magee
Description
Dataset

This dataset was created by Neal Magee

Contents
R
Json >txt Dataset
universe.roboflow.com
zip
Updated Sep 26, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
slab (2022). Json >txt Dataset [Dataset]. https://universe.roboflow.com/slab/json-txt/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 26, 2022
Dataset authored and provided by
slab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Example Bounding Boxes
Description
Json >txt

## Overview Json >txt is a dataset for object detection tasks - it contains Example annotations for 296 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
Bangladesh, Canada, Isle of Man, Northern Mariana Islands, Tunisia, British Indian Ocean Territory, Nepal, Andorra, Moldova (Republic of), Taiwan
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
groups
stanford.redivis.com
redivis.com
Updated Sep 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LEVANTE (2025). groups [Dataset]. https://stanford.redivis.com/datasets/bm7r-cg5vx85fd
Explore at:
Dataset updated
Sep 17, 2025
Dataset provided by
Levante UDhttp://www.levanteud.com/
Authors
LEVANTE
Time period covered
Sep 10, 2024 - Sep 25, 2025
Description
This upload is from levante-example-dataset/groups.json
h
example-space-to-dataset-json
huggingface.co
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Polok (2024). example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Lakoc/example-space-to-dataset-json
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2024
Authors
Alexander Polok
Description
Lakoc/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community
f
Main Data and Code
figshare.com
zip
Updated Oct 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Momo (2025). Main Data and Code [Dataset]. http://doi.org/10.6084/m9.figshare.29929412.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29929412.v1
Dataset updated
Oct 5, 2025
Dataset provided by
figshare
Authors
Momo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Important Notice: Ethical Use OnlyThis repository provides code and datasets for academic research on misinformation.Please note that the datasets include rumor-related texts. These materials are supplied solely for scholarly analysis and research aimed at understanding and combating misinformation.Prohibited UseDo not use this repository, including its code or data, to create or spread false information in any real-world context.Any misuse of these resources for malicious purposes is strictly forbidden.DisclaimerThe authors bear no responsibility for any unethical or unlawful use of the provided resources.By accessing or using this repository, you acknowledge and agree to comply with these ethical guidelines.Project StructureThe project is organized into three main directories, each corresponding to a major section of the paper's experiments:main_data_and_code/├── rumor_generation/├── rumor_detection/└── rumor_debunking/How to Get StartedPrerequisitesTo successfully run the code and reproduce the results, you will need to:Obtain and configure your own API key for the large language models (LLMs) used in the experiments. Please replace the placeholder API key in the code with your own.For the rumor detection experiments, download the public datasets (Twitter15, Twitter16, FakeNewsNet) from their respective sources. The pre-process scripts in the rumor detection folder must be run first to prepare the public datasets.Please note that many scripts are provided as examples using the Twitter15 dataset. To run experiments on other datasets like Twitter16 or FakeNewsNet, you will need to modify these scripts or create copies and update the corresponding file paths.Detailed Directory Breakdown1. rumor_generation/This directory contains all the code and data related to the rumor generation experiments.rumor_generation_zeroshot.py: Code for the zero-shot rumor generation experiment.rumor_generation_fewshot.py: Code for the few-shot rumor generation experiment.rumor_generation_cot.py: Code for the chain-of-thought (CoT) rumor generation experiment.token_distribution.py: Script to analyze token distribution in the generated text.label_rumors.py：Script to label LLM-generated texts based on whether they contain rumor-related content.extract_reasons.py: Script to extract reasons for rumor generation and rejection.visualization.py: Utility script for generating figures.LDA.py: Code for performing LDA topic modeling on the generated data.rumor_generation_responses.json: The complete output dataset from the rumor generation experiments.generation_reasons_extracted.json: The extracted reasons for generated rumors.rejection_reasons_extracted.json: The extracted reasons for rejected rumor generation requests.2. rumor_detection/This directory contains the code and data used for the rumor detection experiments.nonreasoning_zeroshot_twitter15.py: Code for the non-reasoning, zero-shot detection on the Twitter15 dataset. To run on Twitter16 or FakeNewsNet, update the file paths within the script. Similar experiment scripts below follow the same principle and are not described repeatedly.nonreasoning_fewshot_twitter15.py: Code for the non-reasoning, few-shot detection on the Twitter15 dataset.nonreasoning_cot_twitter15.py: Code for the non-reasoning, CoT detection on the Twitter15 dataset.reasoning_zeroshot_twitter15.py: Code for the Reasoning LLMs, zero-shot detection on the Twitter15 dataset.reasoning_fewshot_twitter15.py: Code for the Reasoning LLMs, few-shot detection on the Twitter15 dataset.reasoning_cot_twitter15.py: Code for the Reasoning LLMs, CoT detection on the Twitter15 dataset.traditional_model.py: Code for the traditional models used as baselines.preprocess_twitter15_and_twitter16.py: Script for preprocessing the Twitter15 and Twitter16 datasets.preprocess_fakenews.py: Script for preprocessing the FakeNewsNet dataset.generate_summary_table.py: Calculates all classification metrics and generates the final summary table for the rumor detection experiments.select_few_shot_example_15.py: Script to pre-select few-shot examples, using the Twitter15 dataset as an example. To generate examples for Twitter16 or FakeNewsNet, update the file paths within the script.twitter15_few_shot_examples.json: Pre-selected few-shot examples for the Twitter15 dataset.twitter16_few_shot_examples.json: Pre-selected few-shot examples for the Twitter16 dataset.fakenewsnet_few_shot_examples.json: Pre-selected few-shot examples for the FakeNewsNet dataset.twitter15_llm_results.json: LLM prediction results on the Twitter15 dataset.twitter16_llm_results.json: LLM prediction results on the Twitter16 dataset.fakenewsnet_llm_results.json: LLM prediction results on the FakeNewsNet dataset.visualization.py: Utility script for generating figures.3. rumor_debunking/This directory contains all the code and data for the rumor debunking experiments.analyze_sentiment.py: Script for analyzing the sentiment of the debunking texts.calculate_readability.py: Script for calculating the readability score of the debunking texts.plot_readability.py: Utility script for generating figures related to readability.fact_checking_with_nli.py: Code for the NLI-based fact-checking experiment.debunking_results.json: The dataset containing the debunking results for this experimental section.debunking_results_with_readability.json: The dataset containing the debunking results along with readability scores.sentiment_analysis/: This directory contains the result file from the sentiment analysis.debunking_results_with_sentiment.json: The dataset containing the debunking results along with sentiment analysis.Please contact the repository owner if you encounter any problems or have questions about the code or data.
I
TerriaJS Map Catalog in JSON Format
ihp-wins.unesco.org
data.dev-wins.com
json
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pablo Rojas (2025). TerriaJS Map Catalog in JSON Format [Dataset]. https://ihp-wins.unesco.org/dataset/terriajs-map-catalog-in-json-format
Explore at:
jsonAvailable download formats
Dataset updated
Dec 2, 2025
Dataset provided by
Pablo Rojas
Description
This dataset contains a collection of JSON files used to configure map catalogs in TerriaJS, an interactive geospatial data visualization platform. The files include detailed configurations for services such as WMS, WFS, and other geospatial resources, enabling the integration and visualization of diverse datasets in a user-friendly web interface. This resource is ideal for developers, researchers, and professionals who wish to customize or implement interactive map catalogs in their own applications using TerriaJS.
Data from: Food Recipes dataset
kaggle.com
zip
Updated Aug 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
samsatp (2021). Food Recipes dataset [Dataset]. https://www.kaggle.com/datasets/sathianpong/foodrecipe
Explore at:
zip(181170342 bytes)Available download formats
Dataset updated
Aug 31, 2021
Authors
samsatp
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by samsatp

Released under CC0: Public Domain

Contents
c
Medium articles dataset
crawlfeeds.com
kaggle.com
json, zip
Updated Aug 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Medium articles dataset [Dataset]. https://crawlfeeds.com/datasets/medium-articles-dataset
Explore at:
json, zipAvailable download formats
Dataset updated
Aug 26, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Buy Medium Articles Dataset – 500K+ Published Articles in JSON Format

Get access to a premium Medium articles dataset containing 500,000+ curated articles with metadata including author profiles, publication dates, reading time, tags, claps, and more. Ideal for natural language processing (NLP), machine learning, content trend analysis, and AI model training.

Request here for the large dataset Medium datasets

Checkout sample dataset in CSV

Use Cases:

Training language models (LLMs)

Analyzing content trends and engagement

Sentiment and text classification

SEO research and author profiling

Academic or commercial research

Why Choose This Dataset?

High-volume, cleanly structured JSON

Ideal for developers, researchers, and data scientists

Easy integration with Python, R, SQL, and other data pipelines

Affordable and ready-to-use
R
Val_json >txt Dataset
universe.roboflow.com
zip
Updated Sep 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
slab (2022). Val_json >txt Dataset [Dataset]. https://universe.roboflow.com/slab/val_json-txt
Explore at:
zipAvailable download formats
Dataset updated
Sep 28, 2022
Dataset authored and provided by
slab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Example Bounding Boxes
Description
Val_json >txt

## Overview Val_json >txt is a dataset for object detection tasks - it contains Example annotations for 296 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Z
Data from: 3DHD CityScenes: High-Definition Maps in High-Density Point...
data.niaid.nih.gov
zenodo.org
+1more
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Plachetka, Christopher; Sertolli, Benjamin; Fricke, Jenny; Klingner, Marvin; Fingscheidt, Tim (2024). 3DHD CityScenes: High-Definition Maps in High-Density Point Clouds [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7085089
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
TU Braunschweig
Volkswagen AG
Authors
Plachetka, Christopher; Sertolli, Benjamin; Fricke, Jenny; Klingner, Marvin; Fingscheidt, Tim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

3DHD CityScenes is the most comprehensive, large-scale high-definition (HD) map dataset to date, annotated in the three spatial dimensions of globally referenced, high-density LiDAR point clouds collected in urban domains. Our HD map covers 127 km of road sections of the inner city of Hamburg, Germany including 467 km of individual lanes. In total, our map comprises 266,762 individual items.

Our corresponding paper (published at ITSC 2022) is available here. Further, we have applied 3DHD CityScenes to map deviation detection here.

Moreover, we release code to facilitate the application of our dataset and the reproducibility of our research. Specifically, our 3DHD_DevKit comprises:

Python tools to read, generate, and visualize the dataset,

3DHDNet deep learning pipeline (training, inference, evaluation) for map deviation detection and 3D object detection.

The DevKit is available here:

https://github.com/volkswagen/3DHD_devkit.

The dataset and DevKit have been created by Christopher Plachetka as project lead during his PhD period at Volkswagen Group, Germany.

When using our dataset, you are welcome to cite:

@INPROCEEDINGS{9921866, author={Plachetka, Christopher and Sertolli, Benjamin and Fricke, Jenny and Klingner, Marvin and Fingscheidt, Tim}, booktitle={2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)}, title={3DHD CityScenes: High-Definition Maps in High-Density Point Clouds}, year={2022}, pages={627-634}}

Acknowledgements

We thank the following interns for their exceptional contributions to our work.

Benjamin Sertolli: Major contributions to our DevKit during his master thesis

Niels Maier: Measurement campaign for data collection and data preparation

The European large-scale project Hi-Drive (www.Hi-Drive.eu) supports the publication of 3DHD CityScenes and encourages the general publication of information and databases facilitating the development of automated driving technologies.

The Dataset

After downloading, the 3DHD_CityScenes folder provides five subdirectories, which are explained briefly in the following.

Dataset

This directory contains the training, validation, and test set definition (train.json, val.json, test.json) used in our publications. Respective files contain samples that define a geolocation and the orientation of the ego vehicle in global coordinates on the map.

During dataset generation (done by our DevKit), samples are used to take crops from the larger point cloud. Also, map elements in reach of a sample are collected. Both modalities can then be used, e.g., as input to a neural network such as our 3DHDNet.

To read any JSON-encoded data provided by 3DHD CityScenes in Python, you can use the following code snipped as an example.

import json

json_path = r"E:\3DHD_CityScenes\Dataset\train.json" with open(json_path) as jf: data = json.load(jf) print(data)

HD_Map

Map items are stored as lists of items in JSON format. In particular, we provide:

traffic signs,

traffic lights,

pole-like objects,

construction site locations,

construction site obstacles (point-like such as cones, and line-like such as fences),

line-shaped markings (solid, dashed, etc.),

polygon-shaped markings (arrows, stop lines, symbols, etc.),

lanes (ordinary and temporary),

relations between elements (only for construction sites, e.g., sign to lane association).

HD_Map_MetaData

Our high-density point cloud used as basis for annotating the HD map is split in 648 tiles. This directory contains the geolocation for each tile as polygon on the map. You can view the respective tile definition using QGIS. Alternatively, we also provide respective polygons as lists of UTM coordinates in JSON.

Files with the ending .dbf, .prj, .qpj, .shp, and .shx belong to the tile definition as “shape file” (commonly used in geodesy) that can be viewed using QGIS. The JSON file contains the same information provided in a different format used in our Python API.

HD_PointCloud_Tiles

The high-density point cloud tiles are provided in global UTM32N coordinates and are encoded in a proprietary binary format. The first 4 bytes (integer) encode the number of points contained in that file. Subsequently, all point cloud values are provided as arrays. First all x-values, then all y-values, and so on. Specifically, the arrays are encoded as follows.

x-coordinates: 4 byte integer

y-coordinates: 4 byte integer

z-coordinates: 4 byte integer

intensity of reflected beams: 2 byte unsigned integer

ground classification flag: 1 byte unsigned integer

After reading, respective values have to be unnormalized. As an example, you can use the following code snipped to read the point cloud data. For visualization, you can use the pptk package, for instance.

import numpy as np import pptk

file_path = r"E:\3DHD_CityScenes\HD_PointCloud_Tiles\HH_001.bin" pc_dict = {} key_list = ['x', 'y', 'z', 'intensity', 'is_ground'] type_list = ['

Facebook

Twitter

Click to copy link

Link copied

Cite

Lucain Pouget, example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json

example-space-to-dataset-json

Wauplin/example-space-to-dataset-json

Explore at:

Authors

Lucain Pouget

Description

Demo to save data from a Space to a Dataset. Goal is to provide reusable snippets of code.

Documentation: https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#scheduled-uploads Space: https://huggingface.co/spaces/Wauplin/space_to_dataset_saver/ JSON dataset: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json Image dataset: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-image Image (zipped) dataset:… See the full description on the dataset page: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json.

Clear search

Close search

Google apps

Main menu

example-space-to-dataset-json

example-space-to-dataset-json

json_large_sample

Dataset

Contents

Store Sales json

#PraCegoVer dataset

PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

sample.json

Dataset

Contents

Valencia Portcalls 07/2018 to 12/2018

json with issue examples

Dataset

Contents

Stackoverflow post sample data. JSON format

Dataset

Contents

Sample JSON

Dataset

Contents

Json >txt Dataset

Json >txt

Company Datasets for Business Profiling

groups

example-space-to-dataset-json

Main Data and Code

TerriaJS Map Catalog in JSON Format

Data from: Food Recipes dataset

Dataset

Contents

Medium articles dataset

Buy Medium Articles Dataset – 500K+ Published Articles in JSON Format

Use Cases:

Why Choose This Dataset?

Val_json >txt Dataset

Val_json >txt

Data from: 3DHD CityScenes: High-Definition Maps in High-Density Point...

example-space-to-dataset-jsonSee More Versions

Wauplin/example-space-to-dataset-json

example-space-to-dataset-json