100+ datasets found

h
example-space-to-dataset-json
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucain Pouget, example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json
Explore at:
Authors
Lucain Pouget
Description
Demo to save data from a Space to a Dataset. Goal is to provide reusable snippets of code.

Documentation: https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#scheduled-uploads Space: https://huggingface.co/spaces/Wauplin/space_to_dataset_saver/ JSON dataset: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json Image dataset: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-image Image (zipped) dataset:… See the full description on the dataset page: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json.
h
example-space-to-dataset-json
huggingface.co
Updated Jun 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
t (2024). example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/taichi256/example-space-to-dataset-json
Explore at:
Dataset updated
Jun 8, 2024
Authors
t
Description
taichi256/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community
Mongo DB/ Json datasets
kaggle.com
Updated Sep 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shrashti (2023). Mongo DB/ Json datasets [Dataset]. https://www.kaggle.com/datasets/shrashtisinghal/mongo-db-datsets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 3, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shrashti
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Introducing the largest and most comprehensive collection of Mongo DB Dataset! This meticulously curated dataset brings together a wealth of information from various domains, including ecommerce, aviation, biology, zoology, literature, history, and more. Meticulously gathered from numerous reliable sources, this dataset has been expertly transformed into a unified format, making it an invaluable resource for researchers, data scientists, and enthusiasts alike. Each domain contributes its unique insights and knowledge, providing a diverse range of information for exploration and analysis. With its enriched content and extensive coverage, this Mongo DB Dataset opens up endless possibilities for uncovering hidden patterns, conducting groundbreaking research, and gaining profound insights across multiple disciplines.
r
Data from: JSON Dataset of Simulated Building Heat Control for System of...
researchdata.se
gimi9.com
+1more
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Nilsson (2025). JSON Dataset of Simulated Building Heat Control for System of Systems Interoperability [Dataset]. http://doi.org/10.5878/e5hb-ne80
Explore at:
(438755370), (110041420), (156812), (5417)Available download formats
Unique identifier
https://doi.org/10.5878/e5hb-ne80
Dataset updated
Mar 21, 2025
Dataset provided by
Luleå University of Technology
Authors
Jacob Nilsson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Luleå Municipality
Description
Interoperability in systems-of-systems is a difficult problem due to the abundance of data standards and formats. Current approaches to interoperability rely on hand-made adapters or methods using ontological metadata. This dataset was created to facilitate research on data-driven interoperability solutions. The data comes from a simulation of a building heating system, and the messages sent within control systems-of-systems. For more information see attached data documentation.

The data comes in two semicolon-separated (;) csv files, training.csv and test.csv. The train/test split is not random; training data comes from the first 80% of simulated timesteps, and the test data is the last 20%. There is no specific validation dataset, the validation data should instead be randomly selected from the training data. The simulation runs for as many time steps as there are outside temperature values available. The original SMHI data only samples once every hour, which we linearly interpolate to get one temperature sample every ten seconds. The data saved at each time step consists of 34 JSON messages (four per room and two temperature readings from the outside), 9 temperature values (one per room and outside), 8 setpoint values, and 8 actuator outputs. The data associated with each of those 34 JSON-messages is stored as a single row in the tables. This means that much data is duplicated, a choice made to make it easier to use the data.

The simulation data is not meant to be opened and analyzed in spreadsheet software, it is meant for training machine learning models. It is recommended to open the data with the pandas library for Python, available at https://pypi.org/project/pandas/.

The data file with temperatures (smhi-july-23-29-2018.csv) acts as input for the thermodynamic building simulation found on Github, where it is used to get the outside temperature and corresponding timestamps. Temperature data for Luleå Summer 2018 were downloaded from SMHI.
DataCite Public Data
redivis.com
application/jsonl +7
Updated Dec 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2024). DataCite Public Data [Dataset]. https://redivis.com/datasets/7wec-6vgw8qaaq
Explore at:
application/jsonl, arrow, spss, csv, stata, sas, avro, parquetAvailable download formats
Dataset updated
Dec 12, 2024
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Description
Abstract

The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.

This dataset represents a processed version of the Public Data File, where the data have been extracted and loaded into a Redivis dataset.

Methodology

The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.

Records have descriptive metadata for research outputs and resources structured according to the DataCite Metadata Schema and include links to other persistent identifiers (PIDs) for works (DOIs), people (ORCID iDs), and organizations (ROR IDs).

Use of the DataCite Public Data File is subject to the DataCite Data File Use Policy.

Usage

This datasets is a processed version of the DataCite public data file, where the original file (a 23GB .tar.gz) has been extracted into 55,239 JSONL files, that were then concatenated into a single JSONL file.

This JSONL file has been imported into a Redivis table to facilitate further exploration and analysis.

A sample project demonstrating how to query the DataCite data file can be found here: https://redivis.com/projects/hx1e-a6w8vmwsx
Z
Data from: 3DHD CityScenes: High-Definition Maps in High-Density Point...
data.niaid.nih.gov
zenodo.org
+1more
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fricke, Jenny (2024). 3DHD CityScenes: High-Definition Maps in High-Density Point Clouds [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7085089
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Klingner, Marvin
Plachetka, Christopher
Fricke, Jenny
Sertolli, Benjamin
Fingscheidt, Tim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

3DHD CityScenes is the most comprehensive, large-scale high-definition (HD) map dataset to date, annotated in the three spatial dimensions of globally referenced, high-density LiDAR point clouds collected in urban domains. Our HD map covers 127 km of road sections of the inner city of Hamburg, Germany including 467 km of individual lanes. In total, our map comprises 266,762 individual items.

Our corresponding paper (published at ITSC 2022) is available here. Further, we have applied 3DHD CityScenes to map deviation detection here.

Moreover, we release code to facilitate the application of our dataset and the reproducibility of our research. Specifically, our 3DHD_DevKit comprises:

Python tools to read, generate, and visualize the dataset,

3DHDNet deep learning pipeline (training, inference, evaluation) for map deviation detection and 3D object detection.

The DevKit is available here:

https://github.com/volkswagen/3DHD_devkit.

The dataset and DevKit have been created by Christopher Plachetka as project lead during his PhD period at Volkswagen Group, Germany.

When using our dataset, you are welcome to cite:

@INPROCEEDINGS{9921866, author={Plachetka, Christopher and Sertolli, Benjamin and Fricke, Jenny and Klingner, Marvin and Fingscheidt, Tim}, booktitle={2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)}, title={3DHD CityScenes: High-Definition Maps in High-Density Point Clouds}, year={2022}, pages={627-634}}

Acknowledgements

We thank the following interns for their exceptional contributions to our work.

Benjamin Sertolli: Major contributions to our DevKit during his master thesis

Niels Maier: Measurement campaign for data collection and data preparation

The European large-scale project Hi-Drive (www.Hi-Drive.eu) supports the publication of 3DHD CityScenes and encourages the general publication of information and databases facilitating the development of automated driving technologies.

The Dataset

After downloading, the 3DHD_CityScenes folder provides five subdirectories, which are explained briefly in the following.

Dataset

This directory contains the training, validation, and test set definition (train.json, val.json, test.json) used in our publications. Respective files contain samples that define a geolocation and the orientation of the ego vehicle in global coordinates on the map.

During dataset generation (done by our DevKit), samples are used to take crops from the larger point cloud. Also, map elements in reach of a sample are collected. Both modalities can then be used, e.g., as input to a neural network such as our 3DHDNet.

To read any JSON-encoded data provided by 3DHD CityScenes in Python, you can use the following code snipped as an example.

import json

json_path = r"E:\3DHD_CityScenes\Dataset\train.json" with open(json_path) as jf: data = json.load(jf) print(data)

HD_Map

Map items are stored as lists of items in JSON format. In particular, we provide:

traffic signs,

traffic lights,

pole-like objects,

construction site locations,

construction site obstacles (point-like such as cones, and line-like such as fences),

line-shaped markings (solid, dashed, etc.),

polygon-shaped markings (arrows, stop lines, symbols, etc.),

lanes (ordinary and temporary),

relations between elements (only for construction sites, e.g., sign to lane association).

HD_Map_MetaData

Our high-density point cloud used as basis for annotating the HD map is split in 648 tiles. This directory contains the geolocation for each tile as polygon on the map. You can view the respective tile definition using QGIS. Alternatively, we also provide respective polygons as lists of UTM coordinates in JSON.

Files with the ending .dbf, .prj, .qpj, .shp, and .shx belong to the tile definition as “shape file” (commonly used in geodesy) that can be viewed using QGIS. The JSON file contains the same information provided in a different format used in our Python API.

HD_PointCloud_Tiles

The high-density point cloud tiles are provided in global UTM32N coordinates and are encoded in a proprietary binary format. The first 4 bytes (integer) encode the number of points contained in that file. Subsequently, all point cloud values are provided as arrays. First all x-values, then all y-values, and so on. Specifically, the arrays are encoded as follows.

x-coordinates: 4 byte integer

y-coordinates: 4 byte integer

z-coordinates: 4 byte integer

intensity of reflected beams: 2 byte unsigned integer

ground classification flag: 1 byte unsigned integer

After reading, respective values have to be unnormalized. As an example, you can use the following code snipped to read the point cloud data. For visualization, you can use the pptk package, for instance.

import numpy as np import pptk

file_path = r"E:\3DHD_CityScenes\HD_PointCloud_Tiles\HH_001.bin" pc_dict = {} key_list = ['x', 'y', 'z', 'intensity', 'is_ground'] type_list = ['
h
json_data_extraction
huggingface.co
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
paraloq analytics (2024). json_data_extraction [Dataset]. https://huggingface.co/datasets/paraloq/json_data_extraction
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 1, 2024
Dataset authored and provided by
paraloq analytics
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Diverse Restricted JSON Data Extraction

Curated by: The paraloq analytics team.

Uses

Benchmark restricted JSON data extraction (text + JSON schema -> JSON instance) Fine-Tune data extraction model (text + JSON schema -> JSON instance) Fine-Tune JSON schema Retrieval model (text -> retriever -> most adequate JSON schema)

Out-of-Scope Use

Intended for research purposes only.

Dataset Structure

The data comes with the following fields:

title: The… See the full description on the dataset page: https://huggingface.co/datasets/paraloq/json_data_extraction.
Complete News Data Extracted from CNBC in JSON Format: Covering Business,...
crawlfeeds.com
json, zip
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Complete News Data Extracted from CNBC in JSON Format: Covering Business, Finance, Technology, and Global Trends for Europe, US, and UK Audiences [Dataset]. https://crawlfeeds.com/datasets/complete-news-data-extracted-from-cnbc-in-json-format-covering-business-finance-technology-and-global-trends-for-europe-us-and-uk-audiences
Explore at:
zip, jsonAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Area covered
United States, United Kingdom
Description
We have successfully extracted a comprehensive news dataset from CNBC, covering not only financial updates but also an extensive range of news categories relevant to diverse audiences in Europe, the US, and the UK. This dataset includes over 500,000 records, meticulously structured in JSON format for seamless integration and analysis.

Diverse News Segments for In-Depth Analysis

This extensive extraction spans multiple segments, such as:

Business and Market Analysis: Stay updated on major companies, mergers, and acquisitions.

Technology and Innovation: Explore developments in AI, cybersecurity, and digital transformation.

Economic Forecasts: Access insights into GDP, employment rates, inflation, and other economic indicators.

Geopolitical Developments: Understand the impact of political events and global trade dynamics on markets.

Personal Finance: Learn about saving strategies, investment tips, and real estate trends.

Each record in the dataset is enriched with metadata tags, enabling precise filtering by region, sector, topic, and publication date.

Why Choose This Dataset?

The comprehensive news dataset provides real-time insights into global developments, corporate strategies, leadership changes, and sector-specific trends. Designed for media analysts, research firms, and businesses, it empowers users to perform:

Trend Analysis

Sentiment Analysis

Predictive Modeling

Additionally, the JSON format ensures easy integration with analytics platforms for advanced processing.

Access More News Datasets

Looking for a rich repository of structured news data? Visit our news dataset collection to explore additional offerings tailored to your analysis needs.

Sample Dataset Available

To get a preview, check out the CSV sample of the CNBC economy articles dataset.
Z
Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...
data.niaid.nih.gov
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mihael Mohorčič (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7509279
Explore at:
Dataset updated
Jan 6, 2023
Dataset provided by
Mihael Mohorčič
Miha Mohorčič
Aleš Simončič
Andrej Hrovat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

Related dataset

Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

Measurement setup

The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

Data preprocessing

The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.

When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

where PR_data is structured as follows:

{ 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

Folder structure

For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

Environments description

The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

Known dataset shortcomings

Due to technical and physical limitations, the dataset contains some identified deficiencies.

PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

Location 1 - Piazza del Duomo - Chierici

The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

Location 2 - Via Etnea - Piazza del Duomo

The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

Location 3 - Via Etnea - Piazza Università

Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

Location 4 - Piazza Università

This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

Recognitions

The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
Z
Data from: #PraCegoVer dataset
data.niaid.nih.gov
Updated Jan 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandra Avila (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
Explore at:
Dataset updated
Jan 19, 2023
Dataset provided by
Sandra Avila
Esther Luna Colombini
Gabriel Oliveira dos Santos
Description
Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

Dataset Structure

PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

containing the images. The file dataset.json comprehends a list of json objects with the attributes:

user: anonymized user that made the post;

filename: image file name;

raw_caption: raw caption;

caption: clean caption;

date: post date.

Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

Download Instructions

If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

python download_dataset.py --access_token=
Extracted Schemas from the Life Sciences Linked Open Data Cloud
figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maulik Kamdar (2023). Extracted Schemas from the Life Sciences Linked Open Data Cloud [Dataset]. http://doi.org/10.6084/m9.figshare.12402425.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12402425.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Maulik Kamdar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.
W
Data.gov.ie datasets
cloud.csiss.gmu.edu
datasalsa.com
+2more
api
Updated Jun 20, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ireland (2019). Data.gov.ie datasets [Dataset]. http://cloud.csiss.gmu.edu/uddi/ar/dataset/data-gov-ie-datasets
Explore at:
apiAvailable download formats
Dataset updated
Jun 20, 2019
Dataset provided by
Ireland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data.gov.ie API

Data.gov.ie is built using CKAN v2.2 (link is external), which provides a powerful API that allows developers to retrieve datasets, groups or other CKAN objects and search for datasets. There is full documentation (link is external) available for the CKAN API online.

Example API Calls

Get JSON-formatted lists of a site’s datasets or other CKAN objects: data.gov.ie/api/3/action/package_list data.gov.ie/api/3/action/tag_list Get a full JSON representation of a dataset, resource or other object: data.gov.ie/api/3/action/package_show?id=the-walled-towns-of-ireland data.gov.ie/api/3/action/tag_show?id=marine Search for packages or resources matching a query: data.gov.ie/api/3/action/package_search?q=museum data.gov.ie/api/3/action/resource_search?query=name:The%20Walled%20Towns%20of%20Ireland

More information at https://data.gov.ie/developers
e
JSON dataset för simulerad byggnadsvärmekontroll för system-av-system...
b2find.eudat.eu
Updated Apr 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). JSON dataset för simulerad byggnadsvärmekontroll för system-av-system interoperabilitet - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/442bc87f-092d-57d9-a2f0-ba1c7e049d36
Explore at:
Dataset updated
Apr 19, 2022
Description
Interoperability in systems-of-systems is a difficult problem due to the abundance of data standards and formats. Current approaches to interoperability rely on hand-made adapters or methods using ontological metadata. This dataset was created to facilitate research on data-driven interoperability solutions. The data comes from a simulation of a building heating system, and the messages sent within control systems-of-systems. For more information see attached data documentation. The data comes in two semicolon-separated (;) csv files, training.csv and test.csv. The train/test split is not random; training data comes from the first 80% of simulated timesteps, and the test data is the last 20%. There is no specific validation dataset, the validation data should instead be randomly selected from the training data. The simulation runs for as many time steps as there are outside temperature values available. The original SMHI data only samples once every hour, which we linearly interpolate to get one temperature sample every ten seconds. The data saved at each time step consists of 34 JSON messages (four per room and two temperature readings from the outside), 9 temperature values (one per room and outside), 8 setpoint values, and 8 actuator outputs. The data associated with each of those 34 JSON-messages is stored as a single row in the tables. This means that much data is duplicated, a choice made to make it easier to use the data. The simulation data is not meant to be opened and analyzed in spreadsheet software, it is meant for training machine learning models. It is recommended to open the data with the pandas library for Python, available at https://pypi.org/project/pandas/. Datasetet innehåller simulerad servicedata för system-av-system interoperabilitetsforskning. För mer information se bifogad dokumentation och den engelska katalogsidan. Data kommer i två semikolonseparerade (;) csv-filer, training.csv och test.csv. Träning/testfördelningen är inte slumpmässig; träningsdata kommer från de första 80 % av de simulerade tidsstegen och testdata är de sista 20 %. Det finns ingen specifik valideringsdatauppsättning, valideringsdatan bör istället väljas slumpmässigt från träningsdatan. Simuleringen körs i lika många tidssteg som det finns tillgängliga utetemperaturvärden. De ursprungliga SMHI-data samplar bara en gång i timmen, som linjärt interpolerar för att få ett temperaturprov var tionde sekund. Data som sparas vid varje tidssteg består av 34 JSON-meddelanden (fyra per rum och två temperaturavläsningar utifrån), 9 temperaturvärden (ett per rum och utanför), 8 börvärden och 8 ställdonutgångar. Data som är associerade med vart och ett av dessa 34 JSON-meddelanden lagras som en enda rad i tabellerna. Detta innebär att mycket data dupliceras, ett val som görs för att göra det lättare att använda datan. Simuleringsdata är inte avsedd att öppnas och analyseras i kalkylprogram, det är avsett att träna maskininlärningsmodeller. Det rekommenderas att öppna data med pandas-biblioteket för Python, tillgängligt på https://pypi.org/project/pandas/. Building temperature simulation. Simulering av byggnadstemperatur. Simulation
Spider Realistic Dataset In Structure-Grounded Pretraining for Text-to-SQL
zenodo.org
bin, json, txt
Updated Aug 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Deng; Ahmed Hassan Awadallah; Christopher Meek; Oleksandr Polozov; Huan Sun; Matthew Richardson; Xiang Deng; Ahmed Hassan Awadallah; Christopher Meek; Oleksandr Polozov; Huan Sun; Matthew Richardson (2021). Spider Realistic Dataset In Structure-Grounded Pretraining for Text-to-SQL [Dataset]. http://doi.org/10.5281/zenodo.5205322
Explore at:
txt, json, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5205322
Dataset updated
Aug 16, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Xiang Deng; Ahmed Hassan Awadallah; Christopher Meek; Oleksandr Polozov; Huan Sun; Matthew Richardson; Xiang Deng; Ahmed Hassan Awadallah; Christopher Meek; Oleksandr Polozov; Huan Sun; Matthew Richardson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This folder contains the Spider-Realistic dataset used for evaluation in the paper "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). We manually modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning the NL utterance and the DB schema. For more details, please check our paper at https://arxiv.org/abs/2010.12773.

It contains the following files:

- spider-realistic.json
# The spider-realistic evaluation set
# Examples: 508
# Databases: 19
- dev.json
# The original dev split of Spider
# Examples: 1034
# Databases: 20
- tables.json
# The original DB schemas from Spider
# Databases: 166
- README.txt
- license

The Spider-Realistic dataset is created based on the dev split of the Spider dataset realsed by Yu, Tao, et al. "Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task." It is a subset of the original dataset with explicit mention of the column names removed. The sql queries and databases are kept unchanged.
For the format of each json file, please refer to the github page of Spider https://github.com/taoyds/spider.
For the database files please refer to the official Spider release https://yale-lily.github.io/spider.

This dataset is distributed under the CC BY-SA 4.0 license.

If you use the dataset, please cite the following papers including the original Spider datasets, Finegan-Dollak et al., 2018 and the original datasets for Restaurants, GeoQuery, Scholar, Academic, IMDB, and Yelp.

@article{deng2020structure,
title={Structure-Grounded Pretraining for Text-to-SQL},
author={Deng, Xiang and Awadallah, Ahmed Hassan and Meek, Christopher and Polozov, Oleksandr and Sun, Huan and Richardson, Matthew},
journal={arXiv preprint arXiv:2010.12773},
year={2020}
}

@inproceedings{Yu&al.18c,
year = 2018,
title = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
booktitle = {EMNLP},
author = {Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir Radev }
}

@InProceedings{P18-1033,
author = "Finegan-Dollak, Catherine
and Kummerfeld, Jonathan K.
and Zhang, Li
and Ramanathan, Karthik
and Sadasivam, Sesh
and Zhang, Rui
and Radev, Dragomir",
title = "Improving Text-to-SQL Evaluation Methodology",
booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "351--360",
location = "Melbourne, Australia",
url = "http://aclweb.org/anthology/P18-1033"
}

@InProceedings{data-sql-imdb-yelp,
dataset = {IMDB and Yelp},
author = {Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig},
title = {SQLizer: Query Synthesis from Natural Language},
booktitle = {International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM},
month = {October},
year = {2017},
pages = {63:1--63:26},
url = {http://doi.org/10.1145/3133887},
}

@article{data-academic,
dataset = {Academic},
author = {Fei Li and H. V. Jagadish},
title = {Constructing an Interactive Natural Language Interface for Relational Databases},
journal = {Proceedings of the VLDB Endowment},
volume = {8},
number = {1},
month = {September},
year = {2014},
pages = {73--84},
url = {http://dx.doi.org/10.14778/2735461.2735468},
}

@InProceedings{data-atis-geography-scholar,
dataset = {Scholar, and Updated ATIS and Geography},
author = {Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer},
title = {Learning a Neural Semantic Parser from User Feedback},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
year = {2017},
pages = {963--973},
location = {Vancouver, Canada},
url = {http://www.aclweb.org/anthology/P17-1089},
}

@inproceedings{data-geography-original
dataset = {Geography, original},
author = {John M. Zelle and Raymond J. Mooney},
title = {Learning to Parse Database Queries Using Inductive Logic Programming},
booktitle = {Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2},
year = {1996},
pages = {1050--1055},
location = {Portland, Oregon},
url = {http://dl.acm.org/citation.cfm?id=1864519.1864543},
}

@inproceedings{data-restaurants-logic,
author = {Lappoon R. Tang and Raymond J. Mooney},
title = {Automated Construction of Database Interfaces: Intergrating Statistical and Relational Learning for Semantic Parsing},
booktitle = {2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora},
year = {2000},
pages = {133--141},
location = {Hong Kong, China},
url = {http://www.aclweb.org/anthology/W00-1317},
}

@inproceedings{data-restaurants-original,
author = {Ana-Maria Popescu, Oren Etzioni, and Henry Kautz},
title = {Towards a Theory of Natural Language Interfaces to Databases},
booktitle = {Proceedings of the 8th International Conference on Intelligent User Interfaces},
year = {2003},
location = {Miami, Florida, USA},
pages = {149--157},
url = {http://doi.acm.org/10.1145/604045.604070},
}

@inproceedings{data-restaurants,
author = {Alessandra Giordani and Alessandro Moschitti},
title = {Automatic Generation and Reranking of SQL-derived Answers to NL Questions},
booktitle = {Proceedings of the Second International Conference on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge},
year = {2012},
location = {Montpellier, France},
pages = {59--76},
url = {https://doi.org/10.1007/978-3-642-45260-4_5},
}
e
Text content of the Frequently Asked Questions “business info COVID19”
data.europa.eu
json
Updated Sep 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Direction Générale des Entreprises (2024). Text content of the Frequently Asked Questions “business info COVID19” [Dataset]. https://data.europa.eu/88u/dataset/5ec3a046c9e9abed50d770a9
Explore at:
json(366118)Available download formats
Dataset updated
Sep 1, 2024
Dataset authored and provided by
Direction Générale des Entreprises
License
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Description
Frequently Asked Questions for Business in the COVID-19 Context

This dataset contains the articles published on the Covid-19 FAQ for companies published by the Directorate-General for Enterprises at https://info-entreprises-covid19.economie.fr

The data are presented in the JSON format as follows: JSON [ { “title”: “Example article for documentation”, “content”: [ this is the first page of the article. here the second, “‘div’these articles incorporate some HTML formatting‘/div’” ], “path”: [ “File to visit in the FAQ”, “to join the article”] }, ... ] “'” The update is done every day at 6:00 UTC. This data is extracted directly from the site, the source code of the script used to extract the data is available here: https://github.com/chrnin/docCovidDGE
d
Dataset of Newly Incorporated Entities | Company Data | API | CSV | JSON |...
datarade.ai
.json, .csv
Updated Dec 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HitHorizons (2024). Dataset of Newly Incorporated Entities | Company Data | API | CSV | JSON | 80M+ Companies | 50 EU Countries | GDPR-Compliant | Monthly Updated | 2025 [Dataset]. https://datarade.ai/data-products/dataset-of-newly-established-companies-api-csv-json-8-hithorizons
Explore at:
.json, .csvAvailable download formats
Dataset updated
Dec 24, 2024
Dataset authored and provided by
HitHorizons
Area covered
San Marino, Austria, France, Moldova (Republic of), Turkmenistan, Portugal, Monaco, Luxembourg, Turkey, Latvia, European Union
Description
HitHorizons Newly Established Companies Dataset gives access to aggregated firmographic data on 80M+ companies from the whole of Europe and beyond.

Company registration data: company name national identifier and its type registered address: street, postal code, city, state / province, country business activity: SIC code, local activity code with classification system year of establishment company type location type

Sales and number of employees data: sales in EUR, USD and local currency (with local currency code) total number of employees sales and number of employees accuracy local number of employees (in case of multiple branches) companies’ sales and number of employees market position compared to other companies in a country / industry / region

Industry data: size of the whole industry size of all companies operating within a particular SIC code benchmarking within a particular country or industry regional benchmarking (EU 27, state / province)

Contact details: company website company email domain (without person’s name)

Invoicing details available for selected countries: company name company address company VAT number
Train set metadata for DFDC
kaggle.com
Updated Jan 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nosound (2020). Train set metadata for DFDC [Dataset]. https://www.kaggle.com/zaharch/train-set-metadata-for-dfdc/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 4, 2020
Dataset provided by
Kaggle
Authors
nosound
Description
The train data for the DFDC competition is big, almost 500Gb, so I hope it can be useful to have all the json files and the metadata in one dataframe.

The dataset includes, for each video file

Info from the json files: filename, folder, label, original

split: train (118346 videos), public validation test (400 videos) or train sample (400 videos). 119146 videos in total. Note that the public validation and the train sample are subsets of the full train, so it is enough to mark them in this dataframe.

Full file md5 column

Hash on audio file sequence wav.hash and on subset of pixels pxl.hash

The rest are metadata fields from the files, obtained with ffprobe. Note that I removed many columns, which didn't give new information.

Simple analysis of the dataset can be found at: https://www.kaggle.com/zaharch/looking-at-the-full-train-set-metadata
Up-to-date mapping of COVID-19 treatment and vaccine development...
zenodo.org
data.niaid.nih.gov
+1more
bin, csv, png
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomáš Wagner; Ivana Mišová; Ivana Mišová; Ján Frankovský; Ján Frankovský; Tomáš Wagner (2024). Up-to-date mapping of COVID-19 treatment and vaccine development (covid19-help.org data dump) [Dataset]. http://doi.org/10.5281/zenodo.4601446
Explore at:
csv, png, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4601446
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tomáš Wagner; Ivana Mišová; Ivana Mišová; Ján Frankovský; Ján Frankovský; Tomáš Wagner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The free database mapping COVID-19 treatment and vaccine development based on the global scientific research is available at https://covid19-help.org/.

Files provided here are curated partial data exports in the form of .csv files or full data export as .sql script generated with pg_dump from our PostgreSQL 12 database. You can also find .png file with our ER diagram of tables in .sql file in this repository.

Structure of CSV files

*On our site, compounds are named as substances

compounds.csv

Id - Unique identifier in our database (unsigned integer)

Name - Name of the Substance/Compound (string)

Marketed name - The marketed name of the Substance/Compound (string)

Synonyms - Known synonyms (string)

Description - Description (HTML code)

Dietary sources - Dietary sources where the Substance/Compound can be found (string)

Dietary sources URL - Dietary sources URL (string)

Formula - Compound formula (HTML code)

Structure image URL - Url to our website with the structure image (string)

Status - Status of approval (string)

Therapeutic approach - Approach in which Substance/Compound works (string)

Drug status - Availability of Substance/Compound (string)

Additional data - Additional data in stringified JSON format with data as prescribing information and note (string)

General information - General information about Substance/Compound (HTML code)

references.csv

Id - Unique identifier in our database (unsigned integer)

Impact factor - Impact factor of the scientific article (string)

Source title - Title of the scientific article (string)

Source URL - URL link of the scientific article (string)

Tested on species - What testing model was used for the study (string)

Published at - Date of publication of the scientific article (Date in ISO 8601 format)

clinical-trials.csv

Id - Unique identifier in our database (unsigned integer)

Title - Title of the clinical trial study (string)

Acronym title - Acronym of title of the clinical trial study (string)

Source id - Unique identifier in the source database

Source id optional - Optional identifier in other databases (string)

Interventions - Description of interventions (string)

Study type - Type of the conducted study (string)

Study results - Has results? (string)

Phase - Current phase of the clinical trial (string)

Url - URL to clinical trial study page on clinicaltrials.gov (string)

Status - Status in which study currently is (string)

Start date - Date at which study was started (Date in ISO 8601 format)

Completion date - Date at which study was completed (Date in ISO 8601 format)

Additional data - Additional data in the form of stringified JSON with data as locations of study, study design, enrollment, age, outcome measures (string)

compound-reference-relations.csv

Reference id - Id of a reference in our DB (unsigned integer)

Compound id - Id of a substance in our DB (unsigned integer)

Note - Id of a substance in our DB (unsigned integer)

Is supporting - Is evidence supporting or contradictory (Boolean, true if supporting)

compound-clinical-trial.csv

Clinical trial id - Id of a clinical trial in our DB (unsigned integer)

Compound id - Id of a Substance/Compound in our DB (unsigned integer)

tags.csv

Id - Unique identifier in our database (unsigned integer)

Name - Name of the tag (string)

tags-entities.csv

Tag id - Id of a tag in our DB (unsigned integer)

Reference id - Id of a reference in our DB (unsigned integer)

API Specification

Our project also has an Open API that gives you access to our data in a format suitable for processing, particularly in JSON format.

https://covid19-help.org/api-specification

Services are split into five endpoints:

Substances - /api/substances

References - /api/references

Substance-reference relations - /api/substance-reference-relations

Clinical trials - /api/clinical-trials

Clinical trials-substances relations - /api/clinical-trials-substances

Method of providing data

All dates are text strings formatted in compliance with ISO 8601 as YYYY-MM-DD

If the syntax request is incorrect (missing or incorrectly formatted parameters) an HTTP 400 Bad Request response will be returned. The body of the response may include an explanation.

Data updated_at (used for querying changed-from) refers only to a particular entity and not its logical relations. Example: If a new substance reference relation is added, but the substance detail has not changed, this is reflected in the substance reference relation endpoint where a new entity with id and current dates in created_at and updated_at fields will be added, but in substances or references endpoint nothing has changed.

The recommended way of sequential download

During the first download, it is possible to obtain all data by entering an old enough date in the parameter value changed-from, for example: changed-from=2020-01-01 It is important to write down the date on which the receiving the data was initiated let’s say 2020-10-20

For repeated data downloads, it is sufficient to receive only the records in which something has changed. It can therefore be requested with the parameter changed-from=2020-10-20 (example from the previous bullet). Again, it is important to write down the date when the updates were downloaded (eg. 2020-10-20). This date will be used in the next update (refresh) of the data.

Services for entities

List of endpoint URLs:

/api/substances

/api/references

/api/substance-reference-relations

/api/clinical-trials

/api/clinical-trials-substances

Format of the request

All endpoints have these parameters in common:

changed-from - a parameter to return only the entities that have been modified on a given date or later.

continue-after-id - a parameter to return only the entities that have a larger ID than specified in the parameter.

limit - a parameter to return only the number of records specified (up to 1000). The preset number is 100.

Request example:

/api/references?changed-from=2020-01-01&continue-after-id=1&limit=100

Format of the response

The response format is the same for all endpoints.

number_of_remaining_ids - the number of remaining entities that meet the specified criteria but are not displayed on the page. An integer of virtually unlimited size.

entities - an array of entity details in JSON format.

Response example:

{

"number_of_remaining_ids" : 100,

"entities" : [

{

"id": 3,

"url": "https://www.ncbi.nlm.nih.gov/pubmed/32147628",

"title": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).",

"impact_factor": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).",

"tested_on_species": "in silico",

"publication_date": "2020-22-02",

"created_at": "2020-30-03",

"updated_at": "2020-31-03",

"deleted_at": null

},

{

"id": 4,

"url": "https://www.ncbi.nlm.nih.gov/pubmed/32157862",

"title": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report",

"impact_factor": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report",

"tested_on_species": "Patient",

"publication_date": "2020-06-03",

"created_at": "2020-30-03",

"updated_at": "2020-30-03",

"deleted_at": null

},

]

}

Endpoint details

Substances

URL: /api/substances

Substances
Open Data Portal Catalogue
open.canada.ca
datasets.ai
+3more
csv, json, jsonl, png +2
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
Explore at:
csv, sqlite, json, png, jsonl, xlsxAvailable download formats
Dataset updated
Aug 12, 2025
Dataset provided by
Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.
d
Manufacturing Company Data | API | Dataset | CSV | JSON | 4,289,762...
datarade.ai
.json, .csv
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HitHorizons (2024). Manufacturing Company Data | API | Dataset | CSV | JSON | 4,289,762 Companies | 50 European Countries | Data Enrichment | Monthly Updated | GDPR [Dataset]. https://datarade.ai/data-products/hithorizons-manufacturing-company-data-api-csv-json-hithorizons
Explore at:
.json, .csvAvailable download formats
Dataset updated
May 6, 2024
Dataset authored and provided by
HitHorizons
Area covered
Serbia, Austria, Sweden, Guernsey, Bosnia and Herzegovina, Uzbekistan, Kazakhstan, Isle of Man, Ukraine, Czech Republic, Europe
Description
HitHorizons Manufacturing Company Data API gives access to aggregated firmographic data on 4,289,762 manufacturing companies from the whole of Europe and beyond.

Company registration data: company name national identifier and its type registered address: street, postal code, city, state / province, country business activity: SIC code, local activity code with classification system year of establishment company type location type

Sales and number of employees data: sales in EUR, USD and local currency (with local currency code) total number of employees sales and number of employees accuracy local number of employees (in case of multiple branches) companies’ sales and number of employees market position compared to other companies in a country / industry / region

Industry data: size of the whole industry size of all companies operating within a particular SIC code benchmarking within a particular country or industry regional benchmarking (EU 27, state / province)

Contact details: company website company email domain (without person’s name)

Invoicing details available for selected countries: company name company address company VAT number