81 datasets found

Test Data Dummy JSON
figshare.com
json
Updated Nov 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tori Duckworth (2023). Test Data Dummy JSON [Dataset]. http://doi.org/10.6084/m9.figshare.24500974.v2
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24500974.v2
Dataset updated
Nov 6, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Tori Duckworth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This JSON represents a dummy dataset to test the functionality of trusted repository search capabilities and of research data governance practices. The associated dummy dissertation is entitled Data Science Dummy Dissertation. The dummy file is a 1KB JSON containing country data.
O
Sample of Drugs from QHP drug.json files
healthdata.demo.socrata.com
csv, xlsx, xml
Updated Apr 16, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Sample of Drugs from QHP drug.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Drugs-from-QHP-drug-json-files/jaa8-k3k2
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Apr 16, 2016
Description
CSV output from https://github.com/marks/health-insurance-marketplace-analytics/blob/master/flattener/flatten_from_index.py
O
Sample of Providers from QHP provider.json files
healthdata.demo.socrata.com
csv, xlsx, xml
Updated Apr 16, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Sample of Providers from QHP provider.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Providers-from-QHP-provider-json-files/axbq-xnwy
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Apr 16, 2016
Description
CSV output from https://github.com/marks/health-insurance-marketplace-analytics/blob/master/flattener/flatten_from_index.py
[Sample Dataset] March 2025 Public Data File from Crossref
academictorrents.com
bittorrent
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crossref (2025). [Sample Dataset] March 2025 Public Data File from Crossref [Dataset]. https://academictorrents.com/details/3e5545bb8d34b57a50181dfed0d80e88da066045
Explore at:
bittorrent(24326067)Available download formats
Dataset updated
Mar 12, 2025
Dataset authored and provided by
Crossrefhttps://www.crossref.org/
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
[Sample Dataset] March 2025 Public Data File from Crossref. This dataset includes 100 random JSON records from the Crossref metadata corpus.
O
Sample of Plans from QHP plan.json files
healthdata.demo.socrata.com
csv, xlsx, xml
Updated Apr 16, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Sample of Plans from QHP plan.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Plans-from-QHP-plan-json-files/x6bt-9iym
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Apr 16, 2016
Description
CSV output from https://github.com/marks/health-insurance-marketplace-analytics/blob/master/flattener/flatten_from_index.py
Example Microscopy Metadata JSON files produced using Micro-Meta App to...
zenodo.org
data.niaid.nih.gov
json, tiff
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karl Bellve; Alessandro Rigano; Kevin Fogarty; Kevin Fogarty; Caterina Strambio-De-Castillia; Caterina Strambio-De-Castillia; Karl Bellve; Alessandro Rigano (2024). Example Microscopy Metadata JSON files produced using Micro-Meta App to document the acquisition of example images using a custom-built TIRF Epifluorescence Structured Illumination Microscope [Dataset]. http://doi.org/10.5281/zenodo.4891883
Explore at:
json, tiffAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4891883
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Karl Bellve; Alessandro Rigano; Kevin Fogarty; Kevin Fogarty; Caterina Strambio-De-Castillia; Caterina Strambio-De-Castillia; Karl Bellve; Alessandro Rigano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example Microscopy Metadata JSON files produced using the Micro-Meta App documenting an example raw-image file acquired using the custom-built TIRF Epifluorescence Structured Illumination Microscope.

For this use case, which is presented in Figure 5 of Rigano et al., 2021, Micro-Meta App was utilized to document:

1) The Hardware Specifications of the custom build TIRF Epifluorescence Structured light Microscope (TESM; Navaroli et al., 2010) developed, built on the basis of the based on Olympus IX71 microscope stand, and owned by the Biomedical Imaging Group (http://big.umassmed.edu/) at the Program in Molecular Medicine of the University of Massachusetts Medical School. Because TESM was custom-built the most appropriate documentation level is Tier 3 (Manufacturing/Technical Development/Full Documentation) as specified by the 4DN-BINA-OME Microscopy Metadata model (Hammer et al., 2021).

The TESM Hardware Specifications are stored in: Rigano et al._Figure 5_UseCase_Biomedical Imaging Group_TESM.JSON

2) The Image Acquisition Settings that were applied to the TESM microscope for the acquisition of an example image (FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif) obtained by Nicholas Vecchietti and Caterina Strambio-De-Castillia. For this image, TZM-bl human cells were infected with HIV-1 retroviral three-part vector (FSWT+PAX2+pMD2.G). Six hours post-infection cells were fixed for 10 min with 1% formaldehyde in PBS, and permeabilized. Cells were stained with mouse anti-p24 primary antibody followed by DyLight488-anti-Mouse secondary antibody, to detect HIV-1 viral Capsid. In addition, cells were counterstained using rabbit anti-Lamin B1 primary antibody followed by DyLight649-anti-Rabbit secondary antibody, to visualize the nuclear envelope and with DAPI to visualize the nuclear chromosomal DNA.

The Image Acquisition Settings used to acquire the FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif image are stored in: Rigano et al._Figure 5_UseCase_AS_fswt-6hvirus-10minfix-stk_4-epi.tif.JSON

Instructional video tutorials on how to use these example data files:
Use these videos to get started with using Micro-Meta App after downloading the example data files available here.

Part 1/2

Part 2/2
DataCite Public Data
redivis.com
application/jsonl +7
Updated Dec 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2024). DataCite Public Data [Dataset]. https://redivis.com/datasets/7wec-6vgw8qaaq
Explore at:
application/jsonl, arrow, spss, csv, stata, sas, avro, parquetAvailable download formats
Dataset updated
Dec 12, 2024
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Description
Abstract

The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.

This dataset represents a processed version of the Public Data File, where the data have been extracted and loaded into a Redivis dataset.

Methodology

The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.

Records have descriptive metadata for research outputs and resources structured according to the DataCite Metadata Schema and include links to other persistent identifiers (PIDs) for works (DOIs), people (ORCID iDs), and organizations (ROR IDs).

Use of the DataCite Public Data File is subject to the DataCite Data File Use Policy.

Usage

This datasets is a processed version of the DataCite public data file, where the original file (a 23GB .tar.gz) has been extracted into 55,239 JSONL files, that were then concatenated into a single JSONL file.

This JSONL file has been imported into a Redivis table to facilitate further exploration and analysis.

A sample project demonstrating how to query the DataCite data file can be found here: https://redivis.com/projects/hx1e-a6w8vmwsx
Z
Json file from Twitter API used for benchmarking Jsonpath
data.niaid.nih.gov
Updated Oct 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paperman, Charles (2022). Json file from Twitter API used for benchmarking Jsonpath [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7225576
Explore at:
Dataset updated
Oct 19, 2022
Dataset authored and provided by
Paperman, Charles
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A JSON file used as an example to illustrate queries and to benchmark some tool.
JSON Repository
data.amerigeoss.org
cloud.csiss.gmu.edu
+2more
csv, geojson, json +1
Updated Jun 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UN Humanitarian Data Exchange (2025). JSON Repository [Dataset]. https://data.amerigeoss.org/dataset/json-repository
Explore at:
csv(177), json(1975854), geojson(135805), geojson(162605), csv(536), json(3478518), geojson(9124), csv(9980), geojson(886086), geojson(54889), json(3411081), json(876253), csv(6789), json(559095), csv(457), csv(242), csv(4907), csv(845984), geojson(178718), json(2064743), json(457832), geojson(164379), csv(177073), csv(779), geojson(366788), topojson(2728099), geojson(222216), geojson(2396630), csv(85982), csv(358964), json(3401512), geojson(1324722), geojson(953043), geojson(365288), csv(669568), json(707249), geojson(545299), json(520472), csv(462610), json(640845), geojson(74470), json(632081), json(327649), json(461423), geojson(219728), csv(9901), geojson(709673), json(1132925), geojson(543777)Available download formats
Dataset updated
Jun 4, 2025
Dataset provided by
United Nationshttp://un.org/
Description
This dataset contains resources transformed from other datasets on HDX. They exist here only in a format modified to support visualization on HDX and may not be as up to date as the source datasets from which they are derived.

Source datasets: https://data.hdx.rwlabs.org/dataset/idps-data-by-region-in-mali
r
Data from: JSON Dataset of Simulated Building Heat Control for System of...
researchdata.se
gimi9.com
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Nilsson (2025). JSON Dataset of Simulated Building Heat Control for System of Systems Interoperability [Dataset]. http://doi.org/10.5878/e5hb-ne80
Explore at:
(438755370), (110041420), (156812), (5417)Available download formats
Unique identifier
https://doi.org/10.5878/e5hb-ne80
Dataset updated
Mar 21, 2025
Dataset provided by
Luleå University of Technology
Authors
Jacob Nilsson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Luleå Municipality
Description
Interoperability in systems-of-systems is a difficult problem due to the abundance of data standards and formats. Current approaches to interoperability rely on hand-made adapters or methods using ontological metadata. This dataset was created to facilitate research on data-driven interoperability solutions. The data comes from a simulation of a building heating system, and the messages sent within control systems-of-systems. For more information see attached data documentation.

The data comes in two semicolon-separated (;) csv files, training.csv and test.csv. The train/test split is not random; training data comes from the first 80% of simulated timesteps, and the test data is the last 20%. There is no specific validation dataset, the validation data should instead be randomly selected from the training data. The simulation runs for as many time steps as there are outside temperature values available. The original SMHI data only samples once every hour, which we linearly interpolate to get one temperature sample every ten seconds. The data saved at each time step consists of 34 JSON messages (four per room and two temperature readings from the outside), 9 temperature values (one per room and outside), 8 setpoint values, and 8 actuator outputs. The data associated with each of those 34 JSON-messages is stored as a single row in the tables. This means that much data is duplicated, a choice made to make it easier to use the data.

The simulation data is not meant to be opened and analyzed in spreadsheet software, it is meant for training machine learning models. It is recommended to open the data with the pandas library for Python, available at https://pypi.org/project/pandas/.

The data file with temperatures (smhi-july-23-29-2018.csv) acts as input for the thermodynamic building simulation found on Github, where it is used to get the outside temperature and corresponding timestamps. Temperature data for Luleå Summer 2018 were downloaded from SMHI.
Z
Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...
data.niaid.nih.gov
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mihael Mohorčič (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7509279
Explore at:
Dataset updated
Jan 6, 2023
Dataset provided by
Miha Mohorčič
Aleš Simončič
Andrej Hrovat
Mihael Mohorčič
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

Related dataset

Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

Measurement setup

The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

Data preprocessing

The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.

When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

where PR_data is structured as follows:

{ 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

Folder structure

For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

Environments description

The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

Known dataset shortcomings

Due to technical and physical limitations, the dataset contains some identified deficiencies.

PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

Location 1 - Piazza del Duomo - Chierici

The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

Location 2 - Via Etnea - Piazza del Duomo

The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

Location 3 - Via Etnea - Piazza Università

Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

Location 4 - Piazza Università

This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

Recognitions

The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
Data from: ThermoML/Data Archive
catalog.data.gov
data.nist.gov
+1more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). ThermoML/Data Archive [Dataset]. https://catalog.data.gov/dataset/thermoml-data-archive
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
ThermoML is an XML-based IUPAC standard for the storage and exchange of experimental thermophysical and thermochemical property data. The ThermoML archive is a subset of Thermodynamics Research Center (TRC) data holdings corresponding to cooperation between NIST TRC and five journals: Journal of Chemical Engineering and Data (ISSN: 1520-5134), The Journal of Chemical Thermodynamics (ISSN: 1096-3626), Fluid Phase Equilibria (ISSN: 0378-3812), Thermochimica Acta (ISSN: 0040-6031), and International Journal of Thermophysics (ISSN: 1572-9567). Data from initial cooperation (around 2003) through the 2019 calendar year are included. The original scope of the archive has been expanded to include JSON files. The JSON files are structured according to the ThermoML.xsd (available below) and rendered from the same experimental thermophysical and thermochemical property data reported in the corresponding articles as the ThermoML files. In fact, the ThermoML files are generated from the JSON files to keep the information in sync. The JSON files may contain additional information not supported by the ThermoML schema. For example, each JSON file contains the md5 checksum on the ThermoML file (THERMOML_MD5_CHECKSUM) that may be used to validate the ThermoML download. This data.nist.gov resource provides a .tgz file download containing the JSON and ThermoML files for each version of the archive. Data from initial cooperation (around 2003) through the 2019 calendar year are provided below (ThermoML.v2020-09.30.tgz). The date of the extraction from TRC databases, as specified in the dateCit field of the xml files, are 2020-09-29 and 2020-09-30. The .tgz file contains a directory tree that maps to the DOI prefix/suffix of the entries; e.g. unzipping the .tgz file creates a directory for each of the prefixes ( 10.1007, 10.1016, and 10.1021) that contains all the .json and .xml files. The data and other information throughout this digital resource (including the website, API, JSON, and ThermoML files) have been carefully extracted from the original articles by NIST/TRC personnel. Neither the Journal publisher, nor its editors, nor NIST/TRC warrant or represent, expressly or implied, the correctness or accuracy of the content of information contained throughout this digital resource, nor its fitness for any use or for any purpose, nor can they, or will they, accept any liability or responsibility whatever for the consequences of its use or misuse by anyone. In any individual case of application, the respective user must check the correctness by consulting other relevant sources of information.
Extracted Schemas from the Life Sciences Linked Open Data Cloud
figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maulik Kamdar (2023). Extracted Schemas from the Life Sciences Linked Open Data Cloud [Dataset]. http://doi.org/10.6084/m9.figshare.12402425.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12402425.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Maulik Kamdar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.
Country State GeoJSON
kaggle.com
Updated May 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mukesh Chapagain (2020). Country State GeoJSON [Dataset]. https://www.kaggle.com/chapagain/country-state-geo-location/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 8, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mukesh Chapagain
Description
About

World Country and State coordinate for plotting geospatial maps.

Source

Files source:

Folium GitHub Repository:

https://github.com/python-visualization/folium

https://github.com/python-visualization/folium/tree/master/examples/data

World Geo Repository

https://github.com/johan/world.geo.json
A Dataset of Outdoor RSS Measurements for Localization
zenodo.org
data.niaid.nih.gov
json, tiff, zip
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frost Mitchell; Frost Mitchell; Aniqua Baset; Sneha Kumar Kasera; Aditya Bhaskara; Aniqua Baset; Sneha Kumar Kasera; Aditya Bhaskara (2024). A Dataset of Outdoor RSS Measurements for Localization [Dataset]. http://doi.org/10.5281/zenodo.7259895
Explore at:
tiff, json, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7259895
Dataset updated
Jul 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Frost Mitchell; Frost Mitchell; Aniqua Baset; Sneha Kumar Kasera; Aditya Bhaskara; Aniqua Baset; Sneha Kumar Kasera; Aditya Bhaskara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Description

This dataset is a large-scale set of measurements for RSS-based localization. The data consists of received signal strength (RSS) measurements taken using the POWDER Testbed at the University of Utah. Samples include either 0, 1, or 2 active transmitters.

The dataset consists of 5,214 unique samples, with transmitters in 5,514 unique locations. The majority of the samples contain only 1 transmitter, but there are small sets of samples with 0 or 2 active transmitters, as shown below. Each sample has RSS values from between 10 and 25 receivers. The majority of the receivers are stationary endpoints fixed on the side of buildings, on rooftop towers, or on free-standing poles. A small set of receivers are located on shuttles which travel specific routes throughout campus.

Dataset Description Sample Count Receiver Count
No-Tx Samples 46 10 to 25
1-Tx Samples 4822 10 to 25
2-Tx Samples 346 11 to 12

The transmitters for this dataset are handheld walkie-talkies (Baofeng BF-F8HP) transmitting in the FRS/GMRS band at 462.7 MHz. These devices have a rated transmission power of 1 W. The raw IQ samples were processed through a 6 kHz bandpass filter to remove neighboring transmissions, and the RSS value was calculated as follows:

\(RSS = \frac{10}{N} \log_{10}\left(\sum_i^N x_i^2 \right) \)

Measurement Parameters Description
Frequency 462.7 MHz
Radio Gain 35 dB
Receiver Sample Rate 2 MHz
Sample Length N=10,000
Band-pass Filter 6 kHz
Transmitters 0 to 2
Transmission Power 1 W

Receivers consist of Ettus USRP X310 and B210 radios, and a mix of wide- and narrow-band antennas, as shown in the table below Each receiver took measurements with a receiver gain of 35 dB. However, devices have different maxmimum gain settings, and no calibration data was available, so all RSS values in the dataset are uncalibrated, and are only relative to the device.

Usage Instructions

Data is provided in .json format, both as one file and as split files.

import json data_file = 'powder_462.7_rss_data.json' with open(data_file) as f: data = json.load(f)

The json data is a dictionary with the sample timestamp as a key. Within each sample are the following keys:

rx_data: A list of data from each receiver. Each entry contains RSS value, latitude, longitude, and device name.

tx_coords: A list of coordinates for each transmitter. Each entry contains latitude and longitude.

metadata: A list of dictionaries containing metadata for each transmitter, in the same order as the rows in tx_coords

File Separations and Train/Test Splits

In the separated_data.zip folder there are several train/test separations of the data.

all_data contains all the data in the main JSON file, separated by the number of transmitters.

stationary consists of 3 cases where a stationary receiver remained in one location for several minutes. This may be useful for evaluating localization using mobile shuttles, or measuring the variation in the channel characteristics for stationary receivers.

train_test_splits contains unique data splits used for training and evaluating ML models. These splits only used data from the single-tx case. In other words, the union of each splits, along with unused.json, is equivalent to the file all_data/single_tx.json.

The random split is a random 80/20 split of the data.

special_test_cases contains the stationary transmitter data, indoor transmitter data (with high noise in GPS location), and transmitters off campus.

The grid split divides the campus region in to a 10 by 10 grid. Each grid square is assigned to the training or test set, with 80 squares in the training set and the remainder in the test set. If a square is assigned to the test set, none of its four neighbors are included in the test set. Transmitters occuring in each grid square are assigned to train or test. One such random assignment of grid squares makes up the grid split.

The seasonal split contains data separated by the month of collection, in April or July.

The transportation split contains data separated by the method of movement for the transmitter: walking, cycling, or driving. The non-driving.json file contains the union of the walking and cycling data.

campus.json contains the on-campus data, so is equivalent to the union of each split, not including unused.json.

Digital Surface Model

The dataset includes a digital surface model (DSM) from a State of Utah 2013-2014 LiDAR survey. This map includes the University of Utah campus and surrounding area. The DSM includes buildings and trees, unlike some digital elevation models.

To read the data in python:

import rasterio as rio import numpy as np import utm dsm_object = rio.open('dsm.tif') dsm_map = dsm_object.read(1) # a np.array containing elevation values dsm_resolution = dsm_object.res # a tuple containing x,y resolution (0.5 meters) dsm_transform = dsm_object.transform # an Affine transform for conversion to UTM-12 coordinates utm_transform = np.array(dsm_transform).reshape((3,3))[:2] utm_top_left = utm_transform @ np.array([0,0,1]) utm_bottom_right = utm_transform @ np.array([dsm_object.shape[0], dsm_object.shape[1], 1]) latlon_top_left = utm.to_latlon(utm_top_left[0], utm_top_left[1], 12, 'T') latlon_bottom_right = utm.to_latlon(utm_bottom_right[0], utm_bottom_right[1], 12, 'T')

Dataset Acknowledgement: This DSM file is acquired by the State of Utah and its partners, and is in the public domain and can be freely distributed with proper credit to the State of Utah and its partners. The State of Utah and its partners makes no warranty, expressed or implied, regarding its suitability for a particular use and shall not be liable under any circumstances for any direct, indirect, special, incidental, or consequential damages with respect to users of this product.

DSM DOI: https://doi.org/10.5069/G9TH8JNQ
Z
Example input files for MS²Rescore
data.niaid.nih.gov
Updated Jan 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CompOmics (2024). Example input files for MS²Rescore [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10561461
Explore at:
Dataset updated
Jan 30, 2024
Dataset authored and provided by
CompOmics
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example input files for MS²Rescore

https://github.com/compomics/ms2rescore

Download and unzip the data to examples/data and use with the configuration TOML or JSON files provided in the repository in the examples directory.
l
Flora Samples Import and Taxonomy Update to Database
metadatacatalogue.lifewatch.eu
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Flora Samples Import and Taxonomy Update to Database [Dataset]. https://metadatacatalogue.lifewatch.eu/geonetwork/search?keyword=Taxonomy%20update
Explore at:
Dataset updated
Jun 1, 2024
Description
This workflow aims to efficiently integrate floral sample data from Excel files into a MongoDB database for botanical projects. It involves verifying and updating taxonomic information, importing georeferenced floral samples, converting data to JSON format, and uploading it to the database. This process ensures accurate taxonomy and enriches the database with comprehensive sample information, supporting robust data analysis and enhancing the project's overall dataset. Background Efficient management of flora sample data is essential in botanical projects, especially when integrating diverse information into a MongoDB database. This workflow addresses the challenge of incorporating floral samples, collected at various sampling points, into the MongoDB database. The database is divided into two segments: one storing taxonomic information and common characteristics of taxa, and the other containing georeferenced floral samples with relevant information. The workflow ensures that, upon importing new samples, taxonomic information is verified and updated, if necessary, before storing the sample data. Introduction In botanical projects, effective data handling is pivotal, particularly when incorporating diverse flora samples into a MongoDB database. This workflow focuses on importing floral samples from an Excel file into MongoDB, ensuring data integrity and taxonomic accuracy. The database is structured into taxonomic information and a collection of georeferenced floral samples, each with essential details about the collection location and the species' nativity. The workflow dynamically updates taxonomic records and stores new samples in the appropriate database sections, enriching the overall floral sample collection. Aims The primary aim of this workflow is to streamline the integration of floral sample data into the MongoDB database, maintaining taxonomic accuracy and enhancing the overall collection. The workflow includes the following key components: - Taxonomy Verification and Update: Checks and updates taxonomic information in the MongoDB database, ensuring accuracy before importing new floral samples. - Georeferenced Sample Import: Imports floral samples from the Excel file, containing georeferenced information and additional sample details. - JSON Transformation and Database Upload: Transforms the floral sample information from the Excel file into JSON format and uploads it to the appropriate sections of the MongoDB database. Scientific Questions - Taxonomy Verification Process: How effectively does the workflow verify and update taxonomic information before importing new floral samples? - Georeferenced Sample Storage: How does the workflow handle the storage of georeferenced floral samples, considering collection location and species nativity? - JSON Transformation Accuracy: How successful is the transformation of floral sample information from the Excel file into JSON format for MongoDB integration? - Database Enrichment: How does the workflow contribute to enriching the taxonomic and sample collections in the MongoDB database, and how is this reflected in the overall project dataset?
m
Ransomware and user samples for training and validating ML models
data.mendeley.com
Updated Sep 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo Berrueta (2021). Ransomware and user samples for training and validating ML models [Dataset]. http://doi.org/10.17632/yhg5wk39kf.2
Explore at:
Unique identifier
https://doi.org/10.17632/yhg5wk39kf.2
Dataset updated
Sep 17, 2021
Authors
Eduardo Berrueta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ransomware is considered as a significant threat for most enterprises since past few years. In scenarios wherein users can access all files on a shared server, one infected host is capable of locking the access to all shared files. In the article related to this repository, we detect ransomware infection based on file-sharing traffic analysis, even in the case of encrypted traffic. We compare three machine learning models and choose the best for validation. We train and test the detection model using more than 70 ransomware binaries from 26 different families and more than 2500 h of ‘not infected’ traffic from real users. The results reveal that the proposed tool can detect all ransomware binaries, including those not used in the training phase (zero-days). This paper provides a validation of the algorithm by studying the false positive rate and the amount of information from user files that the ransomware could encrypt before being detected.

This dataset directory contains the 'infected' and 'not infected' samples and the models used for each T configuration, each one in a separated folder.

The folders are named NxSy where x is the number of 1-second interval per sample and y the sliding step in seconds.

Each folder (for example N10S10/) contains: - tree.py -> Python script with the Tree model. - ensemble.json -> JSON file with the information about the Ensemble model. - NN_XhiddenLayer.json -> JSON file with the information about the NN model with X hidden layers (1, 2 or 3). - N10S10.csv -> All samples used for training each model in this folder. It is in csv format for using in bigML application. - zeroDays.csv -> All zero-day samples used for testing each model in this folder. It is in csv format for using in bigML application. - userSamples_test -> All samples used for validating each model in this folder. It is in csv format for using in bigML application. - userSamples_train -> User samples used for training the models. - ransomware_train -> Ransomware samples used for training the models - scaler.scaler -> Standard Scaler from python library used for scale the samples. - zeroDays_notFiltered -> Folder with the zeroDay samples.

In the case of N30S30 folder, there is an additional folder (SMBv2SMBv3NFS) with the samples extracted from the SMBv2, SMBv3 and NFS traffic traces. There are more binaries than the ones presented in the article, but it is because some of them are not "unseen" binaries (the families are present in the training set).

The files containing samples (NxSy.csv, zeroDays.csv and userSamples_test.csv) are structured as follows: - Each line is one sample. - Each sample has 3*T features and the label (1 if it is 'infected' sample and 0 if it is not). - The features are separated by ',' because it is a csv file. - The last column is the label of the sample.

Additionally we have placed two pcap files in root directory. There are the traces used for compare both versions of SMB.
Data from: Spatiotemporal structure of SARS-CoV-2 mutational frequencies in...
zenodo.org
application/gzip, csv +1
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Art Poon; Art Poon (2025). Spatiotemporal structure of SARS-CoV-2 mutational frequencies in wastewater samples from Ontario [Dataset]. http://doi.org/10.5281/zenodo.15269977
Explore at:
json, csv, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15269977
Dataset updated
Apr 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Art Poon; Art Poon
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
These data are provided in support of a manuscript in progress. The files are generally derived from the frequencies of SARS-CoV-2 mutations observed in next-generation sequence data derived from wastewater samples collected from the province of Ontario, Canada, along with metadata. A brief description of each file follows:

sample_metadata.json - a JSON file associating each sample name to sample collection date and location of sampling (siteID)

site_metadata.json - a JSON file associating each site ID to the full site name, latitude and longitude, public health region, municipality, and estimated sewershed population size

coverage_all.csv.gz - a gzip-compressed CSV file with read depths for every genome coordinate, by sample (integer index)

coverage_meta.csv - a CSV file associating integer indices to sample metadata

count_mutations.csv - a CSV file with numbers of each type of mutation (insertion, deletion, non-synonymous and synonymous nucleotide substitutions) per sample

count_reads.csv: a CSV file with the total number of reads mapped to the reference genome per sample

heatmap.csv: a CSV file with the data required to generate a heatmap visualization of mutation frequencies over time, limited to mutations with an overall mean frequency between 5% and 95% over the study period.

all-mutation-counts.csv: a CSV file listing the frequencies that mutations were observed in any sample, given a minimum read depth of 100 reads and a minimum relative frequency of 1%. Note a "~" prefix indicates a substitution, followed by its reference nucleotide coordinate and derived nucleotide. The "+" and "-" prefixes indicate insertions and deletions, respectively, with a ".N" suffix giving the length of the indel.

dot-product.csv: a CSV file of the matrix of normalized dot products between every sample, excluding one outlier (n=1,587).

dot-prod.meta.csv: a CSV file of metadata for the n=1,857 samples represented by the dot product matrix

Dataset Description	Sample Count	Receiver Count
No-Tx Samples	46	10 to 25
1-Tx Samples	4822	10 to 25
2-Tx Samples	346	11 to 12

Measurement Parameters	Description
Frequency	462.7 MHz
Radio Gain	35 dB
Receiver Sample Rate	2 MHz
Sample Length	N=10,000
Band-pass Filter	6 kHz
Transmitters	0 to 2
Transmission Power	1 W

Yelp dataset 2024

kaggle.com

Updated Oct 29, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

snax07 (2024). Yelp dataset 2024 [Dataset]. https://www.kaggle.com/datasets/snax07/yelp-dataset-2024

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 29, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

snax07

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Yelp Dataset JSON Each file is composed of a single object type, one JSON-object per-line.

Take a look at some examples to get you started: https://github.com/Yelp/dataset-examples.

Note: the follow examples contain inline comments, which are technically not valid JSON. This is done here to simplify the documentation and explaining the structure, the JSON files you download will not contain any comments and will be fully valid JSON.

business.json Contains business data including location data, attributes, and categories.

{ // string, 22 character unique string business id "business_id": "tnhfDv5Il8EaGSXZGiuQGg",

// string, the business's name
"name": "Garaje",

// string, the full address of the business
"address": "475 3rd St",

// string, the city
"city": "San Francisco",

// string, 2 character state code, if applicable
"state": "CA",

// string, the postal code
"postal code": "94107",

// float, latitude
"latitude": 37.7817529521,

// float, longitude
"longitude": -122.39612197,

// float, star rating, rounded to half-stars
"stars": 4.5,

// integer, number of reviews
"review_count": 1198,

// integer, 0 or 1 for closed or open, respectively
"is_open": 1,

// object, business attributes to values. note: some attribute values might be objects
"attributes": {
  "RestaurantsTakeOut": true,
  "BusinessParking": {
    "garage": false,
    "street": true,
    "validated": false,
    "lot": false,
    "valet": false
  },
},

// an array of strings of business categories
"categories": [
  "Mexican",
  "Burgers",
  "Gastropubs"
],

// an object of key day to value hours, hours are using a 24hr clock
"hours": {
  "Monday": "10:00-21:00",
  "Tuesday": "10:00-21:00",
  "Friday": "10:00-21:00",
  "Wednesday": "10:00-21:00",
  "Thursday": "10:00-21:00",
  "Sunday": "11:00-18:00",
  "Saturday": "10:00-21:00"
}

} review.json Contains full review text data including the user_id that wrote the review and the business_id the review is written for.

{ // string, 22 character unique review id "review_id": "zdSx_SD6obEhz9VrW9uAWA",

// string, 22 character unique user id, maps to the user in user.json
"user_id": "Ha3iJu77CxlrFm-vQRs_8g",

// string, 22 character business id, maps to business in business.json
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",

// integer, star rating
"stars": 4,

// string, date formatted YYYY-MM-DD
"date": "2016-03-09",

// string, the review itself
"text": "Great place to hang out after work: the prices are decent, and the ambience is fun. It's a bit loud, but very lively. The staff is friendly, and the food is good. They have a good selection of drinks.",

// integer, number of useful votes received
"useful": 0,

// integer, number of funny votes received
"funny": 0,

// integer, number of cool votes received
"cool": 0

} user.json User data including the user's friend mapping and all the metadata associated with the user.

{ // string, 22 character unique user id, maps to the user in user.json "user_id": "Ha3iJu77CxlrFm-vQRs_8g",

// string, the user's first name
"name": "Sebastien",

// integer, the number of reviews they've written
"review_count": 56,

// string, when the user joined Yelp, formatted like YYYY-MM-DD
"yelping_since": "2011-01-01",

// array of strings, an array of the user's friend as user_ids
"friends": [
  "wqoXYLWmpkEH0YvTmHBsJQ",
  "KUXLLiJGrjtSsapmxmpvTA",
  "6e9rJKQC3n0RSKyHLViL-Q"
],

// integer, number of useful votes sent by the user
"useful": 21,

// integer, number of funny votes sent by the user
"funny": 88,

// integer, number of cool votes sent by the user
"cool": 15,

// integer, number of fans the user has
"fans": 1032,

// array of integers, the years the user was elite
"elite": [
  2012,
  2013
],

// float, average rating of all reviews
"average_stars": 4.31,

// integer, number of hot compliments received by the user
"compliment_hot": 339,

// integer, number of more compliments received by the user
"compliment_more": 668,

// integer, number of profile compliments received by the user
"compliment_profile": 42,

// integer, number of cute compliments received by the user
"compliment_cute": 62,

// integer, number of list compliments received by the user
"compliment_list": 37,

// integer, number of note compliments received by the user
"compliment_note": 356,

// integer, number of plain compliments received by the user
"compliment_plain": 68,

// integer, number of coo...

Facebook

Twitter

Click to copy link

Link copied

Cite

Tori Duckworth (2023). Test Data Dummy JSON [Dataset]. http://doi.org/10.6084/m9.figshare.24500974.v2

Test Data Dummy JSON

Explore at:

jsonAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24500974.v2

Dataset updated

Nov 6, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Tori Duckworth

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This JSON represents a dummy dataset to test the functionality of trusted repository search capabilities and of research data governance practices. The associated dummy dissertation is entitled Data Science Dummy Dissertation. The dummy file is a 1KB JSON containing country data.

Clear search

Close search

Google apps

Main menu

Test Data Dummy JSON

Sample of Drugs from QHP drug.json files

Sample of Providers from QHP provider.json files

[Sample Dataset] March 2025 Public Data File from Crossref

Sample of Plans from QHP plan.json files

Example Microscopy Metadata JSON files produced using Micro-Meta App to...

DataCite Public Data

Abstract

Methodology

Usage

Json file from Twitter API used for benchmarking Jsonpath

JSON Repository

Data from: JSON Dataset of Simulated Building Heat Control for System of...

Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...

Data from: ThermoML/Data Archive

Extracted Schemas from the Life Sciences Linked Open Data Cloud

Country State GeoJSON

About

Source

A Dataset of Outdoor RSS Measurements for Localization

Example input files for MS²Rescore

Flora Samples Import and Taxonomy Update to Database

Ransomware and user samples for training and validating ML models

Data from: Spatiotemporal structure of SARS-CoV-2 mutational frequencies in...

Yelp dataset 2024

Test Data Dummy JSON