This dataset contains resources transformed from other datasets on HDX. They exist here only in a format modified to support visualization on HDX and may not be as up to date as the source datasets from which they are derived.
Source datasets: https://data.hdx.rwlabs.org/dataset/idps-data-by-region-in-mali
Download the complete MAC Address JSON database to integrate network data into your projects. Regularly updated and easy to use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Us Map .json data from OSU CSE 5544 class
ThermoML is an XML-based IUPAC standard for the storage and exchange of experimental thermophysical and thermochemical property data. The ThermoML archive is a subset of Thermodynamics Research Center (TRC) data holdings corresponding to cooperation between NIST TRC and five journals: Journal of Chemical Engineering and Data (ISSN: 1520-5134), The Journal of Chemical Thermodynamics (ISSN: 1096-3626), Fluid Phase Equilibria (ISSN: 0378-3812), Thermochimica Acta (ISSN: 0040-6031), and International Journal of Thermophysics (ISSN: 1572-9567). Data from initial cooperation (around 2003) through the 2019 calendar year are included. The original scope of the archive has been expanded to include JSON files. The JSON files are structured according to the ThermoML.xsd (available below) and rendered from the same experimental thermophysical and thermochemical property data reported in the corresponding articles as the ThermoML files. In fact, the ThermoML files are generated from the JSON files to keep the information in sync. The JSON files may contain additional information not supported by the ThermoML schema. For example, each JSON file contains the md5 checksum on the ThermoML file (THERMOML_MD5_CHECKSUM) that may be used to validate the ThermoML download. This data.nist.gov resource provides a .tgz file download containing the JSON and ThermoML files for each version of the archive. Data from initial cooperation (around 2003) through the 2019 calendar year are provided below (ThermoML.v2020-09.30.tgz). The date of the extraction from TRC databases, as specified in the dateCit field of the xml files, are 2020-09-29 and 2020-09-30. The .tgz file contains a directory tree that maps to the DOI prefix/suffix of the entries; e.g. unzipping the .tgz file creates a directory for each of the prefixes ( 10.1007, 10.1016, and 10.1021) that contains all the .json and .xml files. The data and other information throughout this digital resource (including the website, API, JSON, and ThermoML files) have been carefully extracted from the original articles by NIST/TRC personnel. Neither the Journal publisher, nor its editors, nor NIST/TRC warrant or represent, expressly or implied, the correctness or accuracy of the content of information contained throughout this digital resource, nor its fitness for any use or for any purpose, nor can they, or will they, accept any liability or responsibility whatever for the consequences of its use or misuse by anyone. In any individual case of application, the respective user must check the correctness by consulting other relevant sources of information.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the metadata of the datasets published in 85 Dataverse installations and information about each installation's metadata blocks. It also includes the lists of pre-defined licenses or terms of use that dataset depositors can apply to the datasets they publish in the 58 installations that were running versions of the Dataverse software that include that feature. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations and improving understandings about how certain Dataverse features and metadata fields are used. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 22 and August 28, 2023 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another column named "apikey" listing my accounts' API tokens. The Python script expects the CSV file and the listed API tokens to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation)_2023.08.22-2023.08.28.csv │ ├── contributor(citation)_2023.08.22-2023.08.28.csv │ ├── data_source(citation)_2023.08.22-2023.08.28.csv │ ├── ... │ └── topic_classification(citation)_2023.08.22-2023.08.28.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2023.08.27_12.59.59.zip │ ├── dataset_pids_Abacus_2023.08.27_12.59.59.csv │ ├── Dataverse_JSON_metadata_2023.08.27_12.59.59 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2023.08.26_22.14.04.zip │ ├── ADA_Dataverse_2023.08.27_13.16.20.zip │ ├── Arca_Dados_2023.08.27_13.34.09.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2023.08.27_19.24.15.zip └── dataverse_installations_summary_2023.08.28.csv └── dataset_pids_from_most_known_dataverse_installations_2023.08.csv └── license_options_for_each_dataverse_installation_2023.09.05.csv └── metadatablocks_from_most_known_dataverse_installations_2023.09.05.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the citation metadata block and geospatial metadata block of datasets in the 85 Dataverse installations. For example, author(citation)_2023.08.22-2023.08.28.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in the 85 installations, where there's a row for author names, affiliations, identifier types and identifiers. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 85 zipped files, one for each of the 85 Dataverse installations whose dataset metadata I was able to download. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. It also includes the alias/identifier and category of the Dataverse collection that the dataset is in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The Dataverse JSON export of the latest version of each dataset includes "(latest_version)" in the file name. This should help those who are interested in the metadata of only the latest version of each dataset. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I included them so that they can be used when extracting metadata from the dataset's Dataverse JSON exports. The dataverse_installations_summary_2023.08.28.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata...
Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.
PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.
Dataset Structure
containing the images. The file dataset.json comprehends a list of json objects with the attributes:
user: anonymized user that made the post;
filename: image file name;
raw_caption: raw caption;
caption: clean caption;
date: post date.
Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.
Download Instructions
If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:
cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz
Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:
python download_dataset.py --access_token=
The bulk download facility provides the entire contents of each major API data set in a single ZIP file. A small JSON formatted manifest file lists the bulk files and the update date of each file. The manifest is generally updated daily and can be downloaded from http://api.eia.gov/bulk/manifest.txt. The manifest contains information about the bulk files, including all required common core attributes.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A BitTorrent file to download data with the title 'wikidata-20220103-all.json.gz'
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
A BitTorrent file to download data with the title 'wikidata-20240701-all.json.bz2'
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Easily lookup US historical demographics by county FIPS or zipcode in seconds with this file containing over 5,901 different columns including:
*Lat/Long *Boundaries *State FIPS *Population from 2010-2019 *Death Rate from 2010-2019 *Unemployment from 2001-2020 *Education from 1970-2019 *Gender and Age Population
Provided by bitrook.com to help Data Scientists clean data faster.
https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/
https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/
https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/
https://data.world/niccolley/us-zipcode-to-county-state
https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/asrh/cc-est2019-agesex-**.csv https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/cc-est2019-agesex.pdf
https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/asrh/cc-est2019-alldata.csv https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/cc-est2019-alldata.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This collection encompasses all bulk data downloads available on EIA's open data site on 5/14/2025. The manifest.txt files provides descriptions of the included datasets in a JSON format. The datasets are divided by subject.Survey forms used to collect the data are available here: https://www.eia.gov/survey/File name | SubjectAEO2025.zip | Annual Energy Outlook 2025SEDS.zip | State Energy Data SystemsELEC.zip | ElectricityNG.zip | Natural GasPET.zip | PetroleumTOTAL.zip | Total EnergyCOAL.zip | CoalSTEO.zip | Short Term Energy OutlookPET_IMPORTS.zip | Crude Oil ImportsINTL.zip | International Energy DataEBA.zip | US Electric System Operating Data (2019-present)EBA-pre2019.zip | US Electric System Operating Data (before 2019)EMISS.zip | CO2 EmissionsIEO.zip | International Energy OutlookNUC_STATUS.zip | U.S. Nuclear Outages
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains structured drug labeling information (FDA labels) provided by DailyMed and made available through the openFDA Drug Labeling endpoint.
The dataset includes 13 compressed .zip
files with drug label records in JSON format. Each record reflects the full label submitted to the FDA, and the structure matches what you would receive from the /drug/label
API.
drug_interactions
warnings
indications_and_usage
contraindications
adverse_reactions
dosage_and_administration
brand_name
, generic_name
You will also find the 'Human Drug.xlsx' file included in the dataset, which contains the complete data dictionary for reference.
This dataset reflects the most recent version available as of April 9, 2025. According to the source, previous records may be modified in future updates. For accuracy and completeness, all files should be downloaded together.
Do not rely on openFDA to make decisions regarding medical care. Always speak to your health provider about the risks and benefits of FDA-regulated products. We may limit or otherwise restrict your access to the API in line with our Terms of Service.
Full terms available here: openFDA Terms of Service
This dataset is ideal for applications involving: - Drug safety analysis - Drug interaction monitoring - Medical language modeling - Retrieval-augmented generation (RAG) agents - Regulatory and pharmacovigilance systems
You may want to extract and preprocess only relevant fields before vectorizing or feeding them into an AI model for efficiency and performance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To download XML and JSON files, click the CSV option below and click the down arrow next to the Download button in the upper right on its page.
The downloadall extension for CKAN enhances dataset accessibility by adding a "Download all" button to dataset pages. This feature enables users to download a single zip file containing all resource files associated with a dataset, along with a datapackage.json file that provides machine-readable metadata. The extension streamlines the data packaging and distribution process, ensuring data and its documentation are kept together. Key Features: Single-Click Download: Adds a "Download all" button to dataset pages, allowing users to download all resources and metadata in one go. Data Package Creation: Generates a datapackage.json file conforming to the Frictionless Data standard, including dataset metadata. Comprehensive Data Packaging: Packages all data files and datapackage.json into a single zip file to ensure usability. Data Dictionary Inclusion: If resources are stored in the DataStore (using xloader or datapusher), the datapackage.json will include the data dictionary (schema) of the data, specifying column types. Background Zip Creation: Utilises a CKAN background job to (re)create the zip file when a dataset is created, updated, or the data dictionary changes. The extension will detect updates when all data have been uploaded but only if the dataset is updated. Command-Line Interface: Includes a command-line interface for various operations. Technical Integration: The downloadall extension integrates into CKAN as a plugin, adding a new button to the dataset view. It depends on the CKAN background job worker to generate the zip files, and if used with DataStore and xloader (or datapusher), incorporates the data dictionary into the datapackage.json. The extension requires activation in the CKAN configuration file (production.ini). Specific CKAN versions are supported, primarily 2.7 and 2.8. Benefits & Impact: Implementing the downloadall extension can improve data accessibility and usability by providing a convenient way to download datasets and their associated metadata. It streamlines workflows for data analysts, researchers, and others who need comprehensive access to datasets and their documentation. The inclusion of machine-readable metadata in the form of a datapackage.json facilitates automation and standardisation in data processing and validation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To download XML and JSON files, click the CSV option below and click the down arrow next to the Download button in the upper right on its page.
Please note that this service will stop between mid April 2021 and mid October 2021 and will be replaced by another service. It is possible to download the full content of the former EU ODP website: datasets, metadata, keywords, by using simple http requests. As there is a limitation to 1000 datasets for each query, we split the query into smaller queries. On this page, each "Resource" corresponds to a query generating a file of 1000 datasets in JSON format. By downloading all the resources, you download all the ODP metadata records.
Custom JSON File created for download
Every day, the Site Scanning program runs a scanning engine to dynamically pull down lists of domains from various sources and then scan them with a collection of scan plugins to gather data on them. The resulting data that populates this API then can be seen as having two main utilities: Providing a fairly comprehensive dataset of US federal government websites. Providing various information and analysis about each of these websites. In addition to querying the data via API, you can also download it directly as a CSV or JSON file.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
A BitTorrent file to download data with the title '20150112.json.gz'
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Fourier ActionNet Dataset
Introduction
The data collected from two primary sources: the robot side and the camera side. The HDF5 file contains the robot-side data, while the camera-side data is stored in the corresponding episode folder. While there is also a metadata.json file in the dataset, which contains all episodes id, and it's prompt.
Download the dataset
First, you can easily download the dataset online, which will be in a .tar file. After downloading… See the full description on the dataset page: https://huggingface.co/datasets/FourierIntelligence/ActionNet.
This dataset contains resources transformed from other datasets on HDX. They exist here only in a format modified to support visualization on HDX and may not be as up to date as the source datasets from which they are derived.
Source datasets: https://data.hdx.rwlabs.org/dataset/idps-data-by-region-in-mali