100+ datasets found
  1. h

    example-space-to-dataset-json

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucain Pouget, example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json
    Explore at:
    Authors
    Lucain Pouget
    Description
  2. h

    example-space-to-dataset-json

    • huggingface.co
    Updated Jun 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    t (2024). example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/taichi256/example-space-to-dataset-json
    Explore at:
    Dataset updated
    Jun 8, 2024
    Authors
    t
    Description

    taichi256/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. O

    Sample of Drugs from QHP drug.json files

    • healthdata.demo.socrata.com
    csv, xlsx, xml
    Updated Apr 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Sample of Drugs from QHP drug.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Drugs-from-QHP-drug-json-files/jaa8-k3k2
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Apr 16, 2016
    Description
  4. h

    json_data_extraction

    • huggingface.co
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    paraloq analytics (2024). json_data_extraction [Dataset]. https://huggingface.co/datasets/paraloq/json_data_extraction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 1, 2024
    Dataset authored and provided by
    paraloq analytics
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Diverse Restricted JSON Data Extraction

    Curated by: The paraloq analytics team.

      Uses
    

    Benchmark restricted JSON data extraction (text + JSON schema -> JSON instance) Fine-Tune data extraction model (text + JSON schema -> JSON instance) Fine-Tune JSON schema Retrieval model (text -> retriever -> most adequate JSON schema)

      Out-of-Scope Use
    

    Intended for research purposes only.

      Dataset Structure
    

    The data comes with the following fields:

    title: The… See the full description on the dataset page: https://huggingface.co/datasets/paraloq/json_data_extraction.

  5. O

    Sample of Providers from QHP provider.json files

    • healthdata.demo.socrata.com
    csv, xlsx, xml
    Updated Apr 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Sample of Providers from QHP provider.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Providers-from-QHP-provider-json-files/axbq-xnwy
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Apr 16, 2016
    Description
  6. F# Data: Making structured data first-class

    • figshare.com
    bin
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomas Petricek (2016). F# Data: Making structured data first-class [Dataset]. http://doi.org/10.6084/m9.figshare.1169941.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Tomas Petricek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accessing data in structured formats such as XML, CSV and JSON in statically typed languages is difficult, because the languages do not understand the structure of the data. Dynamically typed languages make this syntactically easier, but lead to error-prone code. Despite numerous efforts, most of the data available on the web do not come with a schema. The only information available to developers is a set of examples, such as typical server responses. We describe an inference algorithm that infers a type of structured formats including CSV, XML and JSON. The algorithm is based on finding a common supertype of types representing individual samples (or values in collections). We use the algorithm as a basis for an F# type provider that integrates the inference into the F# type system. As a result, users can access CSV, XML and JSON data in a statically-typed fashion just by specifying a representative sample document.

  7. Z

    Json file from Twitter API used for benchmarking Jsonpath

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paperman, Charles (2022). Json file from Twitter API used for benchmarking Jsonpath [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7225576
    Explore at:
    Dataset updated
    Oct 19, 2022
    Dataset authored and provided by
    Paperman, Charles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A JSON file used as an example to illustrate queries and to benchmark some tool.

  8. Data from: Food Recipes dataset

    • kaggle.com
    Updated Aug 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    samsatp (2021). Food Recipes dataset [Dataset]. https://www.kaggle.com/datasets/sathianpong/foodrecipe
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2021
    Dataset provided by
    Kaggle
    Authors
    samsatp
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by samsatp

    Released under CC0: Public Domain

    Contents

  9. DataCite Public Data

    • redivis.com
    application/jsonl +7
    Updated Dec 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2024). DataCite Public Data [Dataset]. https://redivis.com/datasets/7wec-6vgw8qaaq
    Explore at:
    application/jsonl, arrow, spss, csv, stata, sas, avro, parquetAvailable download formats
    Dataset updated
    Dec 12, 2024
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Description

    Abstract

    The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.

    This dataset represents a processed version of the Public Data File, where the data have been extracted and loaded into a Redivis dataset.

    Methodology

    The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.

    Records have descriptive metadata for research outputs and resources structured according to the DataCite Metadata Schema and include links to other persistent identifiers (PIDs) for works (DOIs), people (ORCID iDs), and organizations (ROR IDs).

    Use of the DataCite Public Data File is subject to the DataCite Data File Use Policy.

    Usage

    This datasets is a processed version of the DataCite public data file, where the original file (a 23GB .tar.gz) has been extracted into 55,239 JSONL files, that were then concatenated into a single JSONL file.

    This JSONL file has been imported into a Redivis table to facilitate further exploration and analysis.

    A sample project demonstrating how to query the DataCite data file can be found here: https://redivis.com/projects/hx1e-a6w8vmwsx

  10. f

    Example outputs.

    • plos.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Z. DeMaere; Aaron E. Darling (2023). Example outputs. [Dataset]. http://doi.org/10.1371/journal.pcbi.1008839.s005
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Matthew Z. DeMaere; Aaron E. Darling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Aside from reporting quality results to the user via the console, an analysis run produces a quality report written to disk in both HTML and JSON formats. The create if either output format can be disabled. The JSON format files can be imported by MultiQC. This zip archive includes example results of both BAM and KMER modes, as well as the resulting MultiQC report. (ZIP)

  11. e

    Text content of the Frequently Asked Questions “business info COVID19”

    • data.europa.eu
    json
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Direction Générale des Entreprises (2024). Text content of the Frequently Asked Questions “business info COVID19” [Dataset]. https://data.europa.eu/88u/dataset/5ec3a046c9e9abed50d770a9
    Explore at:
    json(366118)Available download formats
    Dataset updated
    Sep 1, 2024
    Dataset authored and provided by
    Direction Générale des Entreprises
    License

    https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence

    Description

    Frequently Asked Questions for Business in the COVID-19 Context

    This dataset contains the articles published on the Covid-19 FAQ for companies published by the Directorate-General for Enterprises at https://info-entreprises-covid19.economie.fr

    The data are presented in the JSON format as follows: JSON [ { “title”: “Example article for documentation”, “content”: [ this is the first page of the article. here the second, “‘div’these articles incorporate some HTML formatting‘/div’” ], “path”: [ “File to visit in the FAQ”, “to join the article”] }, ... ] “'” The update is done every day at 6:00 UTC. This data is extracted directly from the site, the source code of the script used to extract the data is available here: https://github.com/chrnin/docCovidDGE

  12. Clinicalcodes.org example JSON research object

    • figshare.com
    txt
    Updated Jan 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Springate; Evangelos Kontopantelis; Darren M Ashcroft; Iván Olier; Rosa Parisi; Edmore Chamapiwa; David Reeves (2016). Clinicalcodes.org example JSON research object [Dataset]. http://doi.org/10.6084/m9.figshare.1008900.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 18, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    David Springate; Evangelos Kontopantelis; Darren M Ashcroft; Iván Olier; Rosa Parisi; Edmore Chamapiwa; David Reeves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example JSON research object output from www.clinicalcodes.org for clinical codes for a research article. see https://github.com/rOpenHealth/ClinicalCodes/tree/master/paper

  13. g

    Data from: JSON Dataset of Simulated Building Heat Control for System of...

    • gimi9.com
    • researchdata.se
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JSON Dataset of Simulated Building Heat Control for System of Systems Interoperability [Dataset]. https://gimi9.com/dataset/eu_https-doi-org-10-5878-1tv7-9x76/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Interoperability in systems-of-systems is a difficult problem due to the abundance of data standards and formats. Current approaches to interoperability rely on hand-made adapters or methods using ontological metadata. This dataset was created to facilitate research on data-driven interoperability solutions. The data comes from a simulation of a building heating system, and the messages sent within control systems-of-systems. For more information see attached data documentation. The data comes in two semicolon-separated (;) csv files, training.csv and test.csv. The train/test split is not random; training data comes from the first 80% of simulated timesteps, and the test data is the last 20%. There is no specific validation dataset, the validation data should instead be randomly selected from the training data. The simulation runs for as many time steps as there are outside temperature values available. The original SMHI data only samples once every hour, which we linearly interpolate to get one temperature sample every ten seconds. The data saved at each time step consists of 34 JSON messages (four per room and two temperature readings from the outside), 9 temperature values (one per room and outside), 8 setpoint values, and 8 actuator outputs. The data associated with each of those 34 JSON-messages is stored as a single row in the tables. This means that much data is duplicated, a choice made to make it easier to use the data. The simulation data is not meant to be opened and analyzed in spreadsheet software, it is meant for training machine learning models. It is recommended to open the data with the pandas library for Python, available at https://pypi.org/project/pandas/.

  14. Up-to-date mapping of COVID-19 treatment and vaccine development...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, png
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomáš Wagner; Ivana Mišová; Ivana Mišová; Ján Frankovský; Ján Frankovský; Tomáš Wagner (2024). Up-to-date mapping of COVID-19 treatment and vaccine development (covid19-help.org data dump) [Dataset]. http://doi.org/10.5281/zenodo.4601446
    Explore at:
    csv, png, binAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tomáš Wagner; Ivana Mišová; Ivana Mišová; Ján Frankovský; Ján Frankovský; Tomáš Wagner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The free database mapping COVID-19 treatment and vaccine development based on the global scientific research is available at https://covid19-help.org/.

    Files provided here are curated partial data exports in the form of .csv files or full data export as .sql script generated with pg_dump from our PostgreSQL 12 database. You can also find .png file with our ER diagram of tables in .sql file in this repository.

    Structure of CSV files

    *On our site, compounds are named as substances

    compounds.csv

    1. Id - Unique identifier in our database (unsigned integer)

    2. Name - Name of the Substance/Compound (string)

    3. Marketed name - The marketed name of the Substance/Compound (string)

    4. Synonyms - Known synonyms (string)

    5. Description - Description (HTML code)

    6. Dietary sources - Dietary sources where the Substance/Compound can be found (string)

    7. Dietary sources URL - Dietary sources URL (string)

    8. Formula - Compound formula (HTML code)

    9. Structure image URL - Url to our website with the structure image (string)

    10. Status - Status of approval (string)

    11. Therapeutic approach - Approach in which Substance/Compound works (string)

    12. Drug status - Availability of Substance/Compound (string)

    13. Additional data - Additional data in stringified JSON format with data as prescribing information and note (string)

    14. General information - General information about Substance/Compound (HTML code)

    references.csv

    1. Id - Unique identifier in our database (unsigned integer)

    2. Impact factor - Impact factor of the scientific article (string)

    3. Source title - Title of the scientific article (string)

    4. Source URL - URL link of the scientific article (string)

    5. Tested on species - What testing model was used for the study (string)

    6. Published at - Date of publication of the scientific article (Date in ISO 8601 format)

    clinical-trials.csv

    1. Id - Unique identifier in our database (unsigned integer)

    2. Title - Title of the clinical trial study (string)

    3. Acronym title - Acronym of title of the clinical trial study (string)

    4. Source id - Unique identifier in the source database

    5. Source id optional - Optional identifier in other databases (string)

    6. Interventions - Description of interventions (string)

    7. Study type - Type of the conducted study (string)

    8. Study results - Has results? (string)

    9. Phase - Current phase of the clinical trial (string)

    10. Url - URL to clinical trial study page on clinicaltrials.gov (string)

    11. Status - Status in which study currently is (string)

    12. Start date - Date at which study was started (Date in ISO 8601 format)

    13. Completion date - Date at which study was completed (Date in ISO 8601 format)

    14. Additional data - Additional data in the form of stringified JSON with data as locations of study, study design, enrollment, age, outcome measures (string)

    compound-reference-relations.csv

    1. Reference id - Id of a reference in our DB (unsigned integer)

    2. Compound id - Id of a substance in our DB (unsigned integer)

    3. Note - Id of a substance in our DB (unsigned integer)

    4. Is supporting - Is evidence supporting or contradictory (Boolean, true if supporting)

    compound-clinical-trial.csv

    1. Clinical trial id - Id of a clinical trial in our DB (unsigned integer)

    2. Compound id - Id of a Substance/Compound in our DB (unsigned integer)

    tags.csv

    1. Id - Unique identifier in our database (unsigned integer)

    2. Name - Name of the tag (string)

    tags-entities.csv

    1. Tag id - Id of a tag in our DB (unsigned integer)

    2. Reference id - Id of a reference in our DB (unsigned integer)

    API Specification

    Our project also has an Open API that gives you access to our data in a format suitable for processing, particularly in JSON format.

    https://covid19-help.org/api-specification

    Services are split into five endpoints:

    • Substances - /api/substances

    • References - /api/references

    • Substance-reference relations - /api/substance-reference-relations

    • Clinical trials - /api/clinical-trials

    • Clinical trials-substances relations - /api/clinical-trials-substances

    Method of providing data

    • All dates are text strings formatted in compliance with ISO 8601 as YYYY-MM-DD

    • If the syntax request is incorrect (missing or incorrectly formatted parameters) an HTTP 400 Bad Request response will be returned. The body of the response may include an explanation.

    • Data updated_at (used for querying changed-from) refers only to a particular entity and not its logical relations. Example: If a new substance reference relation is added, but the substance detail has not changed, this is reflected in the substance reference relation endpoint where a new entity with id and current dates in created_at and updated_at fields will be added, but in substances or references endpoint nothing has changed.

    The recommended way of sequential download

    • During the first download, it is possible to obtain all data by entering an old enough date in the parameter value changed-from, for example: changed-from=2020-01-01 It is important to write down the date on which the receiving the data was initiated let’s say 2020-10-20

    • For repeated data downloads, it is sufficient to receive only the records in which something has changed. It can therefore be requested with the parameter changed-from=2020-10-20 (example from the previous bullet). Again, it is important to write down the date when the updates were downloaded (eg. 2020-10-20). This date will be used in the next update (refresh) of the data.

    Services for entities

    List of endpoint URLs:

    Format of the request

    All endpoints have these parameters in common:

    • changed-from - a parameter to return only the entities that have been modified on a given date or later.

    • continue-after-id - a parameter to return only the entities that have a larger ID than specified in the parameter.

    • limit - a parameter to return only the number of records specified (up to 1000). The preset number is 100.

    Request example:

    /api/references?changed-from=2020-01-01&continue-after-id=1&limit=100

    Format of the response

    The response format is the same for all endpoints.

    • number_of_remaining_ids - the number of remaining entities that meet the specified criteria but are not displayed on the page. An integer of virtually unlimited size.

    • entities - an array of entity details in JSON format.

    Response example:

    {

    "number_of_remaining_ids" : 100,

    "entities" : [

    {

    "id": 3,

    "url": "https://www.ncbi.nlm.nih.gov/pubmed/32147628",

    "title": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).",

    "impact_factor": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).",

    "tested_on_species": "in silico",

    "publication_date": "2020-22-02",

    "created_at": "2020-30-03",

    "updated_at": "2020-31-03",

    "deleted_at": null

    },

    {

    "id": 4,

    "url": "https://www.ncbi.nlm.nih.gov/pubmed/32157862",

    "title": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report",

    "impact_factor": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report",

    "tested_on_species": "Patient",

    "publication_date": "2020-06-03",

    "created_at": "2020-30-03",

    "updated_at": "2020-30-03",

    "deleted_at": null

    },

    ]

    }

    Endpoint details

    Substances

    URL: /api/substances

    Substances

  15. Complete News Data Extracted from CNBC in JSON Format: Covering Business,...

    • crawlfeeds.com
    json, zip
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Complete News Data Extracted from CNBC in JSON Format: Covering Business, Finance, Technology, and Global Trends for Europe, US, and UK Audiences [Dataset]. https://crawlfeeds.com/datasets/complete-news-data-extracted-from-cnbc-in-json-format-covering-business-finance-technology-and-global-trends-for-europe-us-and-uk-audiences
    Explore at:
    zip, jsonAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    United States, United Kingdom
    Description

    We have successfully extracted a comprehensive news dataset from CNBC, covering not only financial updates but also an extensive range of news categories relevant to diverse audiences in Europe, the US, and the UK. This dataset includes over 500,000 records, meticulously structured in JSON format for seamless integration and analysis.

    Diverse News Segments for In-Depth Analysis

    This extensive extraction spans multiple segments, such as:

    • Business and Market Analysis: Stay updated on major companies, mergers, and acquisitions.
    • Technology and Innovation: Explore developments in AI, cybersecurity, and digital transformation.
    • Economic Forecasts: Access insights into GDP, employment rates, inflation, and other economic indicators.
    • Geopolitical Developments: Understand the impact of political events and global trade dynamics on markets.
    • Personal Finance: Learn about saving strategies, investment tips, and real estate trends.

    Each record in the dataset is enriched with metadata tags, enabling precise filtering by region, sector, topic, and publication date.

    Why Choose This Dataset?

    The comprehensive news dataset provides real-time insights into global developments, corporate strategies, leadership changes, and sector-specific trends. Designed for media analysts, research firms, and businesses, it empowers users to perform:

    • Trend Analysis
    • Sentiment Analysis
    • Predictive Modeling

    Additionally, the JSON format ensures easy integration with analytics platforms for advanced processing.

    Access More News Datasets

    Looking for a rich repository of structured news data? Visit our news dataset collection to explore additional offerings tailored to your analysis needs.

    Sample Dataset Available

    To get a preview, check out the CSV sample of the CNBC economy articles dataset.

  16. Z

    Data from: 3DHD CityScenes: High-Definition Maps in High-Density Point...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Plachetka, Christopher (2024). 3DHD CityScenes: High-Definition Maps in High-Density Point Clouds [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7085089
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Plachetka, Christopher
    Klingner, Marvin
    Sertolli, Benjamin
    Fingscheidt, Tim
    Fricke, Jenny
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    3DHD CityScenes is the most comprehensive, large-scale high-definition (HD) map dataset to date, annotated in the three spatial dimensions of globally referenced, high-density LiDAR point clouds collected in urban domains. Our HD map covers 127 km of road sections of the inner city of Hamburg, Germany including 467 km of individual lanes. In total, our map comprises 266,762 individual items.

    Our corresponding paper (published at ITSC 2022) is available here. Further, we have applied 3DHD CityScenes to map deviation detection here.

    Moreover, we release code to facilitate the application of our dataset and the reproducibility of our research. Specifically, our 3DHD_DevKit comprises:

    Python tools to read, generate, and visualize the dataset,

    3DHDNet deep learning pipeline (training, inference, evaluation) for map deviation detection and 3D object detection.

    The DevKit is available here:

    https://github.com/volkswagen/3DHD_devkit.

    The dataset and DevKit have been created by Christopher Plachetka as project lead during his PhD period at Volkswagen Group, Germany.

    When using our dataset, you are welcome to cite:

    @INPROCEEDINGS{9921866, author={Plachetka, Christopher and Sertolli, Benjamin and Fricke, Jenny and Klingner, Marvin and Fingscheidt, Tim}, booktitle={2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)}, title={3DHD CityScenes: High-Definition Maps in High-Density Point Clouds}, year={2022}, pages={627-634}}

    Acknowledgements

    We thank the following interns for their exceptional contributions to our work.

    Benjamin Sertolli: Major contributions to our DevKit during his master thesis

    Niels Maier: Measurement campaign for data collection and data preparation

    The European large-scale project Hi-Drive (www.Hi-Drive.eu) supports the publication of 3DHD CityScenes and encourages the general publication of information and databases facilitating the development of automated driving technologies.

    The Dataset

    After downloading, the 3DHD_CityScenes folder provides five subdirectories, which are explained briefly in the following.

    1. Dataset

    This directory contains the training, validation, and test set definition (train.json, val.json, test.json) used in our publications. Respective files contain samples that define a geolocation and the orientation of the ego vehicle in global coordinates on the map.

    During dataset generation (done by our DevKit), samples are used to take crops from the larger point cloud. Also, map elements in reach of a sample are collected. Both modalities can then be used, e.g., as input to a neural network such as our 3DHDNet.

    To read any JSON-encoded data provided by 3DHD CityScenes in Python, you can use the following code snipped as an example.

    import json

    json_path = r"E:\3DHD_CityScenes\Dataset\train.json" with open(json_path) as jf: data = json.load(jf) print(data)

    1. HD_Map

    Map items are stored as lists of items in JSON format. In particular, we provide:

    traffic signs,

    traffic lights,

    pole-like objects,

    construction site locations,

    construction site obstacles (point-like such as cones, and line-like such as fences),

    line-shaped markings (solid, dashed, etc.),

    polygon-shaped markings (arrows, stop lines, symbols, etc.),

    lanes (ordinary and temporary),

    relations between elements (only for construction sites, e.g., sign to lane association).

    1. HD_Map_MetaData

    Our high-density point cloud used as basis for annotating the HD map is split in 648 tiles. This directory contains the geolocation for each tile as polygon on the map. You can view the respective tile definition using QGIS. Alternatively, we also provide respective polygons as lists of UTM coordinates in JSON.

    Files with the ending .dbf, .prj, .qpj, .shp, and .shx belong to the tile definition as “shape file” (commonly used in geodesy) that can be viewed using QGIS. The JSON file contains the same information provided in a different format used in our Python API.

    1. HD_PointCloud_Tiles

    The high-density point cloud tiles are provided in global UTM32N coordinates and are encoded in a proprietary binary format. The first 4 bytes (integer) encode the number of points contained in that file. Subsequently, all point cloud values are provided as arrays. First all x-values, then all y-values, and so on. Specifically, the arrays are encoded as follows.

    x-coordinates: 4 byte integer

    y-coordinates: 4 byte integer

    z-coordinates: 4 byte integer

    intensity of reflected beams: 2 byte unsigned integer

    ground classification flag: 1 byte unsigned integer

    After reading, respective values have to be unnormalized. As an example, you can use the following code snipped to read the point cloud data. For visualization, you can use the pptk package, for instance.

    import numpy as np import pptk

    file_path = r"E:\3DHD_CityScenes\HD_PointCloud_Tiles\HH_001.bin" pc_dict = {} key_list = ['x', 'y', 'z', 'intensity', 'is_ground'] type_list = ['

  17. Example Microscopy Metadata JSON files produced using Micro-Meta App to...

    • zenodo.org
    • data.niaid.nih.gov
    json, tiff
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karl Bellve; Alessandro Rigano; Kevin Fogarty; Kevin Fogarty; Caterina Strambio-De-Castillia; Caterina Strambio-De-Castillia; Karl Bellve; Alessandro Rigano (2024). Example Microscopy Metadata JSON files produced using Micro-Meta App to document the acquisition of example images using a custom-built TIRF Epifluorescence Structured Illumination Microscope [Dataset]. http://doi.org/10.5281/zenodo.4891883
    Explore at:
    json, tiffAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Karl Bellve; Alessandro Rigano; Kevin Fogarty; Kevin Fogarty; Caterina Strambio-De-Castillia; Caterina Strambio-De-Castillia; Karl Bellve; Alessandro Rigano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example Microscopy Metadata JSON files produced using the Micro-Meta App documenting an example raw-image file acquired using the custom-built TIRF Epifluorescence Structured Illumination Microscope.

    For this use case, which is presented in Figure 5 of Rigano et al., 2021, Micro-Meta App was utilized to document:

    1) The Hardware Specifications of the custom build TIRF Epifluorescence Structured light Microscope (TESM; Navaroli et al., 2010) developed, built on the basis of the based on Olympus IX71 microscope stand, and owned by the Biomedical Imaging Group (http://big.umassmed.edu/) at the Program in Molecular Medicine of the University of Massachusetts Medical School. Because TESM was custom-built the most appropriate documentation level is Tier 3 (Manufacturing/Technical Development/Full Documentation) as specified by the 4DN-BINA-OME Microscopy Metadata model (Hammer et al., 2021).

    The TESM Hardware Specifications are stored in: Rigano et al._Figure 5_UseCase_Biomedical Imaging Group_TESM.JSON

    2) The Image Acquisition Settings that were applied to the TESM microscope for the acquisition of an example image (FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif) obtained by Nicholas Vecchietti and Caterina Strambio-De-Castillia. For this image, TZM-bl human cells were infected with HIV-1 retroviral three-part vector (FSWT+PAX2+pMD2.G). Six hours post-infection cells were fixed for 10 min with 1% formaldehyde in PBS, and permeabilized. Cells were stained with mouse anti-p24 primary antibody followed by DyLight488-anti-Mouse secondary antibody, to detect HIV-1 viral Capsid. In addition, cells were counterstained using rabbit anti-Lamin B1 primary antibody followed by DyLight649-anti-Rabbit secondary antibody, to visualize the nuclear envelope and with DAPI to visualize the nuclear chromosomal DNA.

    The Image Acquisition Settings used to acquire the FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif image are stored in: Rigano et al._Figure 5_UseCase_AS_fswt-6hvirus-10minfix-stk_4-epi.tif.JSON

    Instructional video tutorials on how to use these example data files:
    Use these videos to get started with using Micro-Meta App after downloading the example data files available here.

  18. d

    Technographic Data | B2B Data | 22M Records | Refreshed 2x/Mo | Delivery...

    • datarade.ai
    .json, .csv, .sql
    Updated Sep 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forager.ai (2024). Technographic Data | B2B Data | 22M Records | Refreshed 2x/Mo | Delivery Hourly via CSV/JSON/PostgreSQL DB Delivery [Dataset]. https://datarade.ai/data-products/technographic-data-b2b-data-22m-records-refreshed-2x-mo-forager-ai
    Explore at:
    .json, .csv, .sqlAvailable download formats
    Dataset updated
    Sep 30, 2024
    Dataset provided by
    Forager.ai
    Area covered
    Uzbekistan, United Kingdom, Barbados, Brazil, Anguilla, Congo, Czech Republic, Uganda, Singapore, Denmark
    Description

    The Forager.ai Global Dataset is a leading source of firmographic data, backed by advanced AI and offering the highest refresh rate in the industry.

    | Volume and Stats |

    • Over 22M total records, the highest volume in the industry today.
    • Every company record refreshed twice a month, offering an unparalleled update frequency.
    • Delivery is made every hour, ensuring you have the latest data at your fingertips.
    • Each record is the result of an advanced AI-driven process, ensuring high-quality, accurate data.

    | Use Cases |

    Sales Platforms, ABM and Intent Data Platforms, Identity Platforms, Data Vendors:

    Example applications include:

    1. Uncover trending technologies or tools gaining popularity.

    2. Pinpoint lucrative business prospects by identifying similar solutions utilized by a specific company.

    3. Study a company's tech stacks to understand the technical capability and skills available within that company.

    B2B Tech Companies:

    • Enrich leads that sign-up through the Company Search API (available separately).
    • Identify and map every company that fits your core personas and ICP.
    • Build audiences to target, using key fields like location, company size, industry, and description.

    Venture Capital and Private Equity:

    • Discover new investment opportunities using company descriptions and industry-level data.
    • Review the growth of private companies and benchmark their strength against competitors.
    • Create high-level views of companies competing in popular verticals for investment.

    | Delivery Options |

    • Flat files via S3 or GCP
    • PostgreSQL Shared Database
    • PostgreSQL Managed Database
    • API
    • Other options available upon request, depending on the scale required

    Our dataset provides a unique blend of volume, freshness, and detail that is perfect for Sales Platforms, B2B Tech, VCs & PE firms, Marketing Automation, ABM & Intent. It stands as a cornerstone in our broader data offering, ensuring you have the information you need to drive decision-making and growth.

    Tags: Company Data, Company Profiles, Employee Data, Firmographic Data, AI-Driven Data, High Refresh Rate, Company Classification, Private Market Intelligence, Workforce Intelligence, Public Companies.

  19. Sec Financial Statement Data in Json

    • kaggle.com
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angular2guy (2025). Sec Financial Statement Data in Json [Dataset]. https://www.kaggle.com/datasets/wbqrmgmcia7lhhq/sec-financial-statement-data-in-json/versions/13
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Angular2guy
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Data from 2010 Q1 to 2025 Q1

    The data is created with this Jupyter Notebook:

    The data format is documented in the Readme. The Sec data documentation can be found here.

    Json structure: {"quarter": "Q1", "country": "Italy", "data": {"cf": [{"value": 0, "concept": "A", "unit": "USD", "label": "B", "info": "C"}], "bs": [{"value": 0, "concept": "A", "unit": "USD", "label": "B", "info": "C"}], "ic": [{"value": 0, "concept": "A", "unit": "USD", "label": "B", "info": "C"}]}, "year": 0, "name": "B", "startDate": "2009-12-31", "endDate": "2010-12-30", "symbol": "GM", "city": "York"}

    An example Json: {"year": 2023, "data": {"cf": [{"value": -1834000000, "concept": "NetCashProvidedByUsedInFinancingActivities", "unit": "USD", "label": "Amount of cash inflow (outflow) from financing … Amount of cash inflow (outflow) from financing …", "info": "Net cash used in financing activities"}], "ic":[{"value": 1000000, "concept": "IncreaseDecreaseInDueFromRelatedParties", "unit": "USD", "label": "The increase (decrease) during the reporting pe… The increase (decrease) during the reporting pe…", "info": "Receivables from related parties"}], "bs": [{"value": 2779000000, "concept": "AccountsPayableCurrent", "unit": "USD", "label": "Carrying value as of the balance sheet date of … Carrying value as of the balance sheet date of …", "info": "Accounts payable"}]}, "quarter": "Q2", "city": "SANTA CLARA", "startDate": "2023-06-30", "name": "ADVANCED MICRO DEVICES INC", "endDate": "2023-09-29", "country": "US", "symbol": "AMD"}

  20. r

    Dataset containing Features from DNS Tunneling Samples stored in JSON files

    • researchdata.se
    Updated May 10, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irvin Homem; Panagiotis Papapetrou (2017). Dataset containing Features from DNS Tunneling Samples stored in JSON files [Dataset]. http://doi.org/10.17045/STHLMUNI.4229399
    Explore at:
    Dataset updated
    May 10, 2017
    Dataset provided by
    Stockholm University
    Authors
    Irvin Homem; Panagiotis Papapetrou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data set containing features extracted from 211 DNS Tunneling packet captures. The packet capture samples are classified by the protocols tunneled within the DNS tunnel. The features are stored in json files for each packet capture. The features in each file include the IP Packet Length, the DNS Query Name Length and the DNS Query Name entropy. In this "slightly unclean" version of the feature set the DNS Query Name field values are also present, but are not actually necessary.

    This feature set may be used to perform machine learning techniques on DNS Tunneling traffic to discover new insights without necessarily having to reconstruct and analyze the equivalent full packet captures.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lucain Pouget, example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json

example-space-to-dataset-json

Wauplin/example-space-to-dataset-json

Explore at:
Authors
Lucain Pouget
Description
Search
Clear search
Close search
Google apps
Main menu