EHR-RelB is a benchmark dataset for biomedical concept relatedness, consisting of 3630 concept pairs sampled from electronic health records (EHRs). EHR-RelA is a smaller dataset of 111 concept pairs, which are mainly unrelated.
https://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdfhttps://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdf
The Fundamental Data Record (FDR) for Atmospheric Composition UVN v.1.0 dataset is a cross-instrument Level-1 product [ATMOS_L1B] generated in 2023 and resulting from the ESA FDR4ATMOS project. The FDR contains selected Earth Observation Level 1b parameters (irradiance/reflectance) from the nadir-looking measurements of the ERS-2 GOME and Envisat SCIAMACHY missions for the period ranging from 1995 to 2012. The data record offers harmonised cross-calibrated spectra with focus on spectral windows in the Ultraviolet-Visible-Near Infrared regions for the retrieval of critical atmospheric constituents like ozone (O3), sulphur dioxide (SO2), nitrogen dioxide (NO2) column densities, alongside cloud parameters. The FDR4ATMOS products should be regarded as experimental due to the innovative approach and the current use of a limited-sized test dataset to investigate the impact of harmonization on the Level 2 target species, specifically SO2, O3 and NO2. Presently, this analysis is being carried out within follow-on activities. The FDR4ATMOS V1 is currently being extended to include the MetOp GOME-2 series. Product format For many aspects, the FDR product has improved compared to the existing individual mission datasets: GOME solar irradiances are harmonised using a validated SCIAMACHY solar reference spectrum, solving the problem of the fast-changing etalon present in the original GOME Level 1b data; Reflectances for both GOME and SCIAMACHY are provided in the FDR product. GOME reflectances are harmonised to degradation-corrected SCIAMACHY values, using collocated data from the CEOS PIC sites; SCIAMACHY data are scaled to the lowest integration time within the spectral band using high-frequency PMD measurements from the same wavelength range. This simplifies the use of the SCIAMACHY spectra which were split in a complex cluster structure (with own integration time) in the original Level 1b data; The harmonization process applied mitigates the viewing angle dependency observed in the UV spectral region for GOME data; Uncertainties are provided. Each FDR product provides, within the same file, irradiance/reflectance data for UV-VIS-NIR special regions across all orbits on a single day, including therein information from the individual ERS-2 GOME and Envisat SCIAMACHY measurements. FDR has been generated in two formats: Level 1A and Level 1B targeting expert users and nominal applications respectively. The Level 1A [ATMOS_L1A] data include additional parameters such as harmonisation factors, PMD, and polarisation data extracted from the original mission Level 1 products. The ATMOS_L1A dataset is not part of the nominal dissemination to users. In case of specific requirements, please contact EOHelp. Please refer to the README file for essential guidance before using the data. All the new products are conveniently formatted in NetCDF. Free standard tools, such as Panoply, can be used to read NetCDF data. Panoply is sourced and updated by external entities. For further details, please consult our Terms and Conditions page. Uncertainty characterisation One of the main aspects of the project was the characterization of Level 1 uncertainties for both instruments, based on metrological best practices. The following documents are provided: General guidance on a metrological approach to Fundamental Data Records (FDR) Uncertainty Characterisation document Effect tables NetCDF files containing example uncertainty propagation analysis and spectral error correlation matrices for SCIAMACHY (Atlantic and Mauretania scene for 2003 and 2010) and GOME (Atlantic scene for 2003) reflectance_uncertainty_example_FDR4ATMOS_GOME.nc reflectance_uncertainty_example_FDR4ATMOS_SCIA.nc
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books and is filtered where the book is The record of mankind, featuring 7 columns including author, BNB id, book, book publisher, and ISBN. The preview is ordered by publication date (descending).
Mapping of deicing material storage facilities in the Lake Champlain Basin was conducted during the late fall and winter of 2022-23. 126 towns were initially selected for mapping (some divisions within the GIS towns data are unincorporated “gores”). Using the list of towns, town clerk contact information was obtained from the Vermont Secretary of State’s website, which maintains a database of contact information for each town.Each town was contacted to request information about their deicing material storage locations and methods. Email and telephone scripts were developed to briefly introduce the project and ask questions about the address of any deicing material storage locations in the town, type of materials stored at each site, duration of time each site has been used, whether materials on site are covered, and the type of surface the materials are stored on, if any. Data were entered into a geospatial database application (Fulcrum). Information was gathered there and exported as ArcGIS file geodatabases and Comma Separated Values (CSV) files for use in Microsoft Excel. Data were collected for 118 towns out of the original 126 on the list (92%). Forty-three (43) towns reported that they are storing multiple materials types at their facilities. Four (4) towns have multiple sites where they store material (Dorset, Pawlet, Morristown, and Castleton). Of these, three (3) store multiple materials at one or both of their sites (Pawlet, Morristown, and Castleton). Where towns have multiple materials or locations, the record information from the overall town identifier is linked to the material stored using a unique ‘one-to-many’ identifier. Locations of deicing material facilities, as shown in the database, were based on the addresses or location descriptions provided by town staff members and was verified only using the most recent aerial imagery (typically later than 2018 for all towns). Locations have not been field verified, nor have site conditions and infrastructure or other information provided by town staff.Dataset instructions:The dataset for Deicing Material Storage Facilities contains two layers – the ‘parent’ records titled ‘salt_storage’ and the ‘child’ records titled ‘salt_storage_record’ with attributes for each salt storage site. This represents a ‘one-to-many’ data structure. To see the attributes for each salt storage site, the user needs to Relate the data. The relationship can be accomplished in GIS software. The Relate needs to be built on the following fields:‘salt_storage’: ‘fulcrum_id’‘salt_storage_record: ‘fulcrum_parent_id’This will create a one-to-many relationship between the geographic locations and the attributes for each salt storage site.
This clean dataset is a refined version of our company datasets, consisting of 35M+ data records.
It’s an excellent data solution for companies with limited data engineering capabilities and those who want to reduce their time to value. You get filtered, cleaned, unified, and standardized B2B data. After cleaning, this data is also enriched by leveraging a carefully instructed large language model (LLM).
AI-powered data enrichment offers more accurate information in key data fields, such as company descriptions. It also produces over 20 additional data points that are very valuable to B2B businesses. Enhancing and highlighting the most important information in web data contributes to quicker time to value, making data processing much faster and easier.
For your convenience, you can choose from multiple data formats (Parquet, JSON, JSONL, or CSV) and select suitable delivery frequency (quarterly, monthly, or weekly).
Coresignal is a leading public business data provider in the web data sphere with an extensive focus on firmographic data and public employee profiles. More than 3B data records in different categories enable companies to build data-driven products and generate actionable insights. Coresignal is exceptional in terms of data freshness, with 890M+ records updated monthly for unprecedented accuracy and relevance.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset was extracted from a set of metadata files harvested from the DataCite metadata store (https://search.datacite.org/ui) during December 2015. Metadata records for items with a resourceType of dataset were collected. 1,647,949 total records were collected. This dataset contains three files: 1) readme.txt: A readme file. 2) version-results.csv: A CSV file containing three columns: DOI, DOI prefix, and version text contents 3) version-counts.csv: A CSV file containing counts for unique version text content values.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Mario Maker 2 user world records
Part of the Mario Maker 2 Dataset Collection
Dataset Description
The Mario Maker 2 user world records dataset consists of 15.3 million world records from Nintendo's online service totaling around 215MB of data. The dataset was created using the self-hosted Mario Maker 2 api over the course of 1 month in February 2022.
How to use it
The Mario Maker 2 user world records dataset is a very large dataset so for most use cases it is… See the full description on the dataset page: https://huggingface.co/datasets/TheGreatRambler/mm2_user_world_record.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information about performance of clinicians interacting with two versions of a patient record, where time spent and successful completion of the task are captured. The research has been published in: Klappe ES, Heijmans J, Groen K, ter Schure J, Cornet R, de Keizer NFCorrectly structured problem lists lead to better and faster clinical decision-making in electronic health records compared to non-curated problem lists: a single-blinded crossover randomized controlled trial International Journal of Medical Informatics, 2023;180:105264.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the dataset described in Section 3.3 of the paper “On Automatic Parsing of Log Records”. Each file contains a specific dataset described in the paper. For example, T_E.txt contains the data for the dataset TE.
In a file, each log string resides on a separate line and contains a 2-tuple separated by tab (‘\t’). The first element of the tuple is the actual log string that has to be parsed. The second element is the corresponding “translation” specifying the field name for each of the characters of the first element.
When using the dataset, please cite it as follows:
@article{rand2021log, author = {Jared Rand and Andriy Miranskyy}, title = {{On Automatic Parsing of Log Records}}, journal = {CoRR}, volume = {abs/2102.06320}, year = {2021}, url = {https://arxiv.org/abs/2102.06320}, archivePrefix = {arXiv}, eprint = {2102.06320} }
https://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdfhttps://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdf
The Mental Health and Learning Disabilities Data Set version 1 (Record Level - sensitive data inclusion). The Mental Health Minimum Data Set was superseded by the Mental Health and Learning Disabilities Data Set, which in turn was superseded by the Mental Health Services Data Set. The Mental Health and Learning Disabilities Data Set collected data from the health records of individual children, young people and adults who were in contact with mental health services.
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
m-a-p/SuperGPQA-Records dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is the majority of records collated, digitised and held by BIS excluding any records known or thought to be already on NBN Atlas for example county datasets, National Scheme and Society datasets, third party data and where the data provider has refused permission to share to the NBN Atlas. Contact BIS for further information or go to website www.bis.org.uk for County recorder contacts.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This is a points dataset showing the location of over 9,000 industrial heritage sites. The Industrial Heritage Record lists more than 16,000 features, but only limited information is currently available for most.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
We are excited to announce that we have successfully extracted a comprehensive set of alcoholic beverage records from BevMo and compiled them into a CSV file.
This meticulously organized dataset includes key information such as product URLs, IDs, names, SKUs, GTIN14 barcodes, detailed product descriptions, availability status, pricing, currency, images, breadcrumbs, and more.
Our dataset provides an invaluable resource for anyone looking to analyze or utilize detailed BevMo product information.
Download the dataset today and gain access to a wealth of information from one of the leading beverage retailers.
Perfect for market analysis, e-commerce insights, and competitive research.
List of open datasets that have been requested from the city
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects, has 4 rows. and is filtered where the books is Olympic and world records 2012. It features 10 columns including book subject, number of authors, number of books, earliest publication date, and latest publication date. The preview is ordered by number of books (descending).
The Recording District Boundary coverage depicts the 34 recording districts established for the administration of a system for recording and filing of documents. These boundaries were created by the Alaska Court System as the Alaska Recording Districts Portfolio (ARDP). The Portfolio dated September 1 1964 was mandated by Alaska Supreme Court Order No. 12 Amendment No. 13 effective July 1 1975. All files and records within these boundaries are maintained by each of the 14 districts Recording Offices.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books and is filtered where the book subjects is Public records-Law and legislation-Great Britain, featuring 9 columns including author, BNB id, book, book publisher, and book subjects. The preview is ordered by publication date (descending).
Analysis of ‘Dataset inventory’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/9fbba11c-6359-4f3b-a34f-2240e23d97a4 on 11 February 2022.
--- Dataset description provided by original source is as follows ---
Annual Inventory Update Notice We have updated the inventory with provisional update as of July 1, 2020 based on an initial cleanup and reconciliation of records. DataSF staff are delayed on a full update because of the ongoing COVID-19 response but are working with departments to complete that annual update by end of July.
A. SUMMARY The dataset inventory provides a list of data maintained by departments that are candidates for open data publishing or have already been published and is collected in accordance with Chapter 22D of the Administrative Code. The inventory will be used in conjunction with department publishing plans to track progress toward meeting plan goals for each department. Department publishing plans are available at https://datasf.org/publishing/plans
B. HOW THE DATASET IS CREATED This dataset is collated through 2 ways: 1. Ongoing updates are made throughout the year to reflect new datasets, this process involves DataSF staff reconciling publishing records after datasets are published 2. Annual bulk updates - departments review their inventories and identify changes and updates and submit those to DataSF for a once a year bulk update - not all departments will have changes or their changes will have been captured over the course of the prior year already as ongoing updates
C. UPDATE PROCESS The dataset is synced automatically daily, but the underlying data changes manually throughout the year as needed
D. HOW TO USE THIS DATASET Interpreting dates in this dataset This dataset has 3 dates: 1. Date Added - when the dataset was added to the inventory itself 2. First Published - when the dataset was initially published on the platform 3. Date Created on Platform - the open data portal automatically captures the date the dataset was first created, this is that system generated date
Note that in certain cases we may have published a dataset prior to it being added to the inventory. We do our best to have an accurate accounting of when something was added to this inventory and when it was published. In most cases the inventory addition will happen prior to publishing, but in certain cases it will be published and we will have missed updating the inventory as this is a manual process.
First published will give an accounting of when it was actually available on the open data catalog and date added when it was added to this list.
Date Created on Platform will show when a dataset was initially created. Because datasets are created and re-created as underlying systems changed, this date can be after the first published date if, for example, there was a new dataset published as an improvement over a previous one. Additionally, for new datasets, this date is often prior to the first published date as it is created, reviewed, QA'd and prepared for release.
Companion systems inventory dataset This is a list of datasets published and unpublished, a companion dataset of citywide enterprise systems of record can be accessed online as well.
--- Original source retains full ownership of the source dataset ---
Link Function: information
EHR-RelB is a benchmark dataset for biomedical concept relatedness, consisting of 3630 concept pairs sampled from electronic health records (EHRs). EHR-RelA is a smaller dataset of 111 concept pairs, which are mainly unrelated.