100+ datasets found
  1. h

    example-space-to-dataset-image-zip

    • huggingface.co
    Updated Jun 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucain Pouget (2023). example-space-to-dataset-image-zip [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-image-zip
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 16, 2023
    Authors
    Lucain Pouget
    Description
  2. h

    example-space-to-dataset-json

    • huggingface.co
    Updated Jun 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    t (2024). example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/taichi256/example-space-to-dataset-json
    Explore at:
    Dataset updated
    Jun 8, 2024
    Authors
    t
    Description

    taichi256/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. O

    Sample of Drugs from QHP drug.json files

    • healthdata.demo.socrata.com
    csv, xlsx, xml
    Updated Apr 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Sample of Drugs from QHP drug.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Drugs-from-QHP-drug-json-files/jaa8-k3k2
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Apr 16, 2016
    Description
  4. JSON Repository

    • data.amerigeoss.org
    • cloud.csiss.gmu.edu
    • +2more
    csv, geojson, json +1
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UN Humanitarian Data Exchange (2025). JSON Repository [Dataset]. https://data.amerigeoss.org/dataset/json-repository
    Explore at:
    geojson(135805), geojson(886086), csv(457), csv(242), geojson(222216), csv(845984), json(3401512), csv(9901), json(2064743), geojson(709673), csv(779), geojson(9124), json(327649), json(640845), csv(462610), geojson(162605), csv(358964), csv(4907), csv(6789), geojson(219728), json(1975854), csv(177), json(632081), geojson(1324722), geojson(543777), csv(536), topojson(2728099), csv(177073), geojson(953043), json(3478518), json(3411081), json(876253), geojson(2396630), geojson(366788), geojson(545299), csv(669568), geojson(178718), json(461423), json(457832), geojson(54889), csv(85982), json(1132925), csv(9980), json(707249), geojson(74470), geojson(365288), json(520472), json(559095), geojson(164379)Available download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    United Nationshttp://un.org/
    Description

    This dataset contains resources transformed from other datasets on HDX. They exist here only in a format modified to support visualization on HDX and may not be as up to date as the source datasets from which they are derived.

    Source datasets: https://data.hdx.rwlabs.org/dataset/idps-data-by-region-in-mali

  5. h

    example-space-to-dataset-json

    • huggingface.co
    Updated May 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    m (2025). example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/mmwmm/example-space-to-dataset-json
    Explore at:
    Dataset updated
    May 26, 2025
    Authors
    m
    Description

    mmwmm/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. json_large_sample

    • kaggle.com
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noura Aly (2023). json_large_sample [Dataset]. https://www.kaggle.com/datasets/nouraaly/json-large-sample
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Noura Aly
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Noura Aly

    Released under Apache 2.0

    Contents

  7. O

    Sample of Providers from QHP provider.json files

    • healthdata.demo.socrata.com
    csv, xlsx, xml
    Updated Apr 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Sample of Providers from QHP provider.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Providers-from-QHP-provider-json-files/axbq-xnwy
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Apr 16, 2016
    Description
  8. c

    Complete News Data Extracted from CNBC in JSON Format: Covering Business,...

    • crawlfeeds.com
    json, zip
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Complete News Data Extracted from CNBC in JSON Format: Covering Business, Finance, Technology, and Global Trends for Europe, US, and UK Audiences [Dataset]. https://crawlfeeds.com/datasets/complete-news-data-extracted-from-cnbc-in-json-format-covering-business-finance-technology-and-global-trends-for-europe-us-and-uk-audiences
    Explore at:
    zip, jsonAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    United States, United Kingdom
    Description

    We have successfully extracted a comprehensive news dataset from CNBC, covering not only financial updates but also an extensive range of news categories relevant to diverse audiences in Europe, the US, and the UK. This dataset includes over 500,000 records, meticulously structured in JSON format for seamless integration and analysis.

    Diverse News Segments for In-Depth Analysis

    This extensive extraction spans multiple segments, such as:

    • Business and Market Analysis: Stay updated on major companies, mergers, and acquisitions.
    • Technology and Innovation: Explore developments in AI, cybersecurity, and digital transformation.
    • Economic Forecasts: Access insights into GDP, employment rates, inflation, and other economic indicators.
    • Geopolitical Developments: Understand the impact of political events and global trade dynamics on markets.
    • Personal Finance: Learn about saving strategies, investment tips, and real estate trends.

    Each record in the dataset is enriched with metadata tags, enabling precise filtering by region, sector, topic, and publication date.

    Why Choose This Dataset?

    The comprehensive news dataset provides real-time insights into global developments, corporate strategies, leadership changes, and sector-specific trends. Designed for media analysts, research firms, and businesses, it empowers users to perform:

    • Trend Analysis
    • Sentiment Analysis
    • Predictive Modeling

    Additionally, the JSON format ensures easy integration with analytics platforms for advanced processing.

    Access More News Datasets

    Looking for a rich repository of structured news data? Visit our news dataset collection to explore additional offerings tailored to your analysis needs.

    Sample Dataset Available

    To get a preview, check out the CSV sample of the CNBC economy articles dataset.

  9. DataCite Public Data

    • redivis.com
    application/jsonl +7
    Updated Dec 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2024). DataCite Public Data [Dataset]. https://redivis.com/datasets/7wec-6vgw8qaaq
    Explore at:
    application/jsonl, arrow, spss, csv, stata, sas, avro, parquetAvailable download formats
    Dataset updated
    Dec 12, 2024
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Description

    Abstract

    The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.

    This dataset represents a processed version of the Public Data File, where the data have been extracted and loaded into a Redivis dataset.

    Methodology

    The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.

    Records have descriptive metadata for research outputs and resources structured according to the DataCite Metadata Schema and include links to other persistent identifiers (PIDs) for works (DOIs), people (ORCID iDs), and organizations (ROR IDs).

    Use of the DataCite Public Data File is subject to the DataCite Data File Use Policy.

    Usage

    This datasets is a processed version of the DataCite public data file, where the original file (a 23GB .tar.gz) has been extracted into 55,239 JSONL files, that were then concatenated into a single JSONL file.

    This JSONL file has been imported into a Redivis table to facilitate further exploration and analysis.

    A sample project demonstrating how to query the DataCite data file can be found here: https://redivis.com/projects/hx1e-a6w8vmwsx

  10. e

    Text content of the Frequently Asked Questions “business info COVID19”

    • data.europa.eu
    json
    Updated Sep 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Direction GĂ©nĂ©rale des Entreprises (2024). Text content of the Frequently Asked Questions “business info COVID19” [Dataset]. https://data.europa.eu/88u/dataset/5ec3a046c9e9abed50d770a9
    Explore at:
    json(366118)Available download formats
    Dataset updated
    Sep 1, 2024
    Dataset authored and provided by
    Direction Générale des Entreprises
    License

    https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence

    Description

    Frequently Asked Questions for Business in the COVID-19 Context

    This dataset contains the articles published on the Covid-19 FAQ for companies published by the Directorate-General for Enterprises at https://info-entreprises-covid19.economie.fr

    The data are presented in the JSON format as follows: JSON [ { “title”: “Example article for documentation”, “content”: [ this is the first page of the article. here the second, “‘div’these articles incorporate some HTML formatting‘/div’” ], “path”: [ “File to visit in the FAQ”, “to join the article”] }, ... ] “'” The update is done every day at 6:00 UTC. This data is extracted directly from the site, the source code of the script used to extract the data is available here: https://github.com/chrnin/docCovidDGE

  11. h

    example-space-to-dataset-json

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/CognitiveScience/example-space-to-dataset-json
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Ahmad Sohrabi
    Description

    CognitiveScience/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. g

    Data from: JSON Dataset of Simulated Building Heat Control for System of...

    • gimi9.com
    • researchdata.se
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JSON Dataset of Simulated Building Heat Control for System of Systems Interoperability [Dataset]. https://gimi9.com/dataset/eu_https-doi-org-10-5878-1tv7-9x76/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Interoperability in systems-of-systems is a difficult problem due to the abundance of data standards and formats. Current approaches to interoperability rely on hand-made adapters or methods using ontological metadata. This dataset was created to facilitate research on data-driven interoperability solutions. The data comes from a simulation of a building heating system, and the messages sent within control systems-of-systems. For more information see attached data documentation. The data comes in two semicolon-separated (;) csv files, training.csv and test.csv. The train/test split is not random; training data comes from the first 80% of simulated timesteps, and the test data is the last 20%. There is no specific validation dataset, the validation data should instead be randomly selected from the training data. The simulation runs for as many time steps as there are outside temperature values available. The original SMHI data only samples once every hour, which we linearly interpolate to get one temperature sample every ten seconds. The data saved at each time step consists of 34 JSON messages (four per room and two temperature readings from the outside), 9 temperature values (one per room and outside), 8 setpoint values, and 8 actuator outputs. The data associated with each of those 34 JSON-messages is stored as a single row in the tables. This means that much data is duplicated, a choice made to make it easier to use the data. The simulation data is not meant to be opened and analyzed in spreadsheet software, it is meant for training machine learning models. It is recommended to open the data with the pandas library for Python, available at https://pypi.org/project/pandas/.

  13. Example Microscopy Metadata JSON files produced using Micro-Meta App to...

    • zenodo.org
    • data.niaid.nih.gov
    json, tiff
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karl Bellve; Alessandro Rigano; Kevin Fogarty; Kevin Fogarty; Caterina Strambio-De-Castillia; Caterina Strambio-De-Castillia; Karl Bellve; Alessandro Rigano (2024). Example Microscopy Metadata JSON files produced using Micro-Meta App to document the acquisition of example images using a custom-built TIRF Epifluorescence Structured Illumination Microscope [Dataset]. http://doi.org/10.5281/zenodo.4891883
    Explore at:
    json, tiffAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Karl Bellve; Alessandro Rigano; Kevin Fogarty; Kevin Fogarty; Caterina Strambio-De-Castillia; Caterina Strambio-De-Castillia; Karl Bellve; Alessandro Rigano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example Microscopy Metadata JSON files produced using the Micro-Meta App documenting an example raw-image file acquired using the custom-built TIRF Epifluorescence Structured Illumination Microscope.

    For this use case, which is presented in Figure 5 of Rigano et al., 2021, Micro-Meta App was utilized to document:

    1) The Hardware Specifications of the custom build TIRF Epifluorescence Structured light Microscope (TESM; Navaroli et al., 2010) developed, built on the basis of the based on Olympus IX71 microscope stand, and owned by the Biomedical Imaging Group (http://big.umassmed.edu/) at the Program in Molecular Medicine of the University of Massachusetts Medical School. Because TESM was custom-built the most appropriate documentation level is Tier 3 (Manufacturing/Technical Development/Full Documentation) as specified by the 4DN-BINA-OME Microscopy Metadata model (Hammer et al., 2021).

    The TESM Hardware Specifications are stored in: Rigano et al._Figure 5_UseCase_Biomedical Imaging Group_TESM.JSON

    2) The Image Acquisition Settings that were applied to the TESM microscope for the acquisition of an example image (FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif) obtained by Nicholas Vecchietti and Caterina Strambio-De-Castillia. For this image, TZM-bl human cells were infected with HIV-1 retroviral three-part vector (FSWT+PAX2+pMD2.G). Six hours post-infection cells were fixed for 10 min with 1% formaldehyde in PBS, and permeabilized. Cells were stained with mouse anti-p24 primary antibody followed by DyLight488-anti-Mouse secondary antibody, to detect HIV-1 viral Capsid. In addition, cells were counterstained using rabbit anti-Lamin B1 primary antibody followed by DyLight649-anti-Rabbit secondary antibody, to visualize the nuclear envelope and with DAPI to visualize the nuclear chromosomal DNA.

    The Image Acquisition Settings used to acquire the FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif image are stored in: Rigano et al._Figure 5_UseCase_AS_fswt-6hvirus-10minfix-stk_4-epi.tif.JSON

    Instructional video tutorials on how to use these example data files:
    Use these videos to get started with using Micro-Meta App after downloading the example data files available here.

  14. f

    TrainingDML-AI JSON Encoding

    • figshare.com
    txt
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peng Yue; Boyi Shangguan (2022). TrainingDML-AI JSON Encoding [Dataset]. http://doi.org/10.6084/m9.figshare.16625071.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    figshare
    Authors
    Peng Yue; Boyi Shangguan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Prototype of TrainingDML-AI JSON schema and examples in the paper "Towards an interoperable training data markup language for artificial intelligence in earth observation"

  15. R

    Json >txt Dataset

    • universe.roboflow.com
    zip
    Updated Sep 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    slab (2022). Json >txt Dataset [Dataset]. https://universe.roboflow.com/slab/json-txt
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2022
    Dataset authored and provided by
    slab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Example Bounding Boxes
    Description

    Json >txt

    ## Overview
    
    Json >txt is a dataset for object detection tasks - it contains Example annotations for 296 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  16. Data from: Food Recipes dataset

    • kaggle.com
    Updated Aug 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    samsatp (2021). Food Recipes dataset [Dataset]. https://www.kaggle.com/datasets/sathianpong/foodrecipe
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2021
    Dataset provided by
    Kaggle
    Authors
    samsatp
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by samsatp

    Released under CC0: Public Domain

    Contents

  17. F# Data: Making structured data first-class

    • figshare.com
    bin
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomas Petricek (2016). F# Data: Making structured data first-class [Dataset]. http://doi.org/10.6084/m9.figshare.1169941.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tomas Petricek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accessing data in structured formats such as XML, CSV and JSON in statically typed languages is difficult, because the languages do not understand the structure of the data. Dynamically typed languages make this syntactically easier, but lead to error-prone code. Despite numerous efforts, most of the data available on the web do not come with a schema. The only information available to developers is a set of examples, such as typical server responses. We describe an inference algorithm that infers a type of structured formats including CSV, XML and JSON. The algorithm is based on finding a common supertype of types representing individual samples (or values in collections). We use the algorithm as a basis for an F# type provider that integrates the inference into the F# type system. As a result, users can access CSV, XML and JSON data in a statically-typed fashion just by specifying a representative sample document.

  18. Z

    Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...

    • data.niaid.nih.gov
    Updated Jan 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrej Hrovat (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7509279
    Explore at:
    Dataset updated
    Jan 6, 2023
    Dataset provided by
    Miha Mohorčič
    Mihael Mohorčič
    Aleơ Simončič
    Andrej Hrovat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

    This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

    It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

    Related dataset

    Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

    Measurement setup

    The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

    The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

    The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

    Data preprocessing

    The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

    PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

    Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
    Missing IE fields in the captured PR are not included in PR_IE_DATA.

    When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

    {'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

    where PR_data is structured as follows:

    { 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

    This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

    At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

    Folder structure

    For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

    The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

    Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

    Environments description

    The measurements were carried out in the city of Catania, in Piazza UniversitĂ  and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

    Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

    Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

    Known dataset shortcomings

    Due to technical and physical limitations, the dataset contains some identified deficiencies.

    PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

    Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

    The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

     Location 1 - Piazza del Duomo - Chierici
    

    The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

     Location 2 - Via Etnea - Piazza del Duomo
    

    The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

     Location 3 - Via Etnea - Piazza UniversitĂ 
    

    Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

     Location 4 - Piazza UniversitĂ 
    

    This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

    Recognitions

    The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.

  19. Z

    Json file from Twitter API used for benchmarking Jsonpath

    • data.niaid.nih.gov
    Updated Oct 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paperman, Charles (2022). Json file from Twitter API used for benchmarking Jsonpath [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7225576
    Explore at:
    Dataset updated
    Oct 19, 2022
    Dataset authored and provided by
    Paperman, Charles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A JSON file used as an example to illustrate queries and to benchmark some tool.

  20. Sec Financial Statement Data in Json

    • kaggle.com
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angular2guy (2025). Sec Financial Statement Data in Json [Dataset]. https://www.kaggle.com/datasets/wbqrmgmcia7lhhq/sec-financial-statement-data-in-json/versions/13
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Angular2guy
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Data from 2010 Q1 to 2025 Q1

    The data is created with this Jupyter Notebook:

    The data format is documented in the Readme. The Sec data documentation can be found here.

    Json structure: {"quarter": "Q1", "country": "Italy", "data": {"cf": [{"value": 0, "concept": "A", "unit": "USD", "label": "B", "info": "C"}], "bs": [{"value": 0, "concept": "A", "unit": "USD", "label": "B", "info": "C"}], "ic": [{"value": 0, "concept": "A", "unit": "USD", "label": "B", "info": "C"}]}, "year": 0, "name": "B", "startDate": "2009-12-31", "endDate": "2010-12-30", "symbol": "GM", "city": "York"}

    An example Json: {"year": 2023, "data": {"cf": [{"value": -1834000000, "concept": "NetCashProvidedByUsedInFinancingActivities", "unit": "USD", "label": "Amount of cash inflow (outflow) from financing 
 Amount of cash inflow (outflow) from financing 
", "info": "Net cash used in financing activities"}], "ic":[{"value": 1000000, "concept": "IncreaseDecreaseInDueFromRelatedParties", "unit": "USD", "label": "The increase (decrease) during the reporting pe
 The increase (decrease) during the reporting pe
", "info": "Receivables from related parties"}], "bs": [{"value": 2779000000, "concept": "AccountsPayableCurrent", "unit": "USD", "label": "Carrying value as of the balance sheet date of 
 Carrying value as of the balance sheet date of 
", "info": "Accounts payable"}]}, "quarter": "Q2", "city": "SANTA CLARA", "startDate": "2023-06-30", "name": "ADVANCED MICRO DEVICES INC", "endDate": "2023-09-29", "country": "US", "symbol": "AMD"}

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lucain Pouget (2023). example-space-to-dataset-image-zip [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-image-zip

example-space-to-dataset-image-zip

Wauplin/example-space-to-dataset-image-zip

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 16, 2023
Authors
Lucain Pouget
Description
Search
Clear search
Close search
Google apps
Main menu