100+ datasets found

Common Metadata Elements for Cataloging Biomedical Datasets
figshare.com
xlsx
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Read (2016). Common Metadata Elements for Cataloging Biomedical Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1496573.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1496573.v1
Dataset updated
Jan 20, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Kevin Read
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.
A Dataset of Metadata of Articles Citing Retracted Articles
zenodo.org
csv
Updated Aug 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yagmur Ozturk; Yagmur Ozturk (2024). A Dataset of Metadata of Articles Citing Retracted Articles [Dataset]. http://doi.org/10.5281/zenodo.13621503
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13621503
Dataset updated
Aug 31, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yagmur Ozturk; Yagmur Ozturk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises of metada of articles citing retracted publications. Originally, we obtained the DOIs from the Feet of Clay Detector of the Problematic Paper Screener (PPS - FoCD). Additional columns that were not provided in PPS were added using Crossref & Retraction Watch Database (CRxRW) and Dimensions API services. This detector flags publications that cite retracted articles with additional metadata.

By querying the Dimensions API with the DOIs of the FoC articles, we acquired information such as more detailed document types (editorial, review article, research article), open access status (we only kept open access FoC articles in the dataset since we want to access the full-texts in the future), and research fields (classified according to the Australian and New Zealand Standard Research Classification (ANZSRC) Fields of Research (FoR), comprising of 23 main fields such as biological sciences, education.

To get further information about the cited retracted articles in the dataset, we used the joint release of CRxRW. Using this dataset, we added the retraction reasons and retraction years.

The original dataset was obtained from the PPS FoCD in December 2023. At this time there were 22558 total articles flagged in FoCD. Using the data filtering feature in PPS, we had a preliminary selection before downloading the first version of the dataset. We applied a filter to obtain:

non-retracted citing articles at the time of data curation*

open-access citing articles since we need the whole text to go forward with natural language processing tasks

cited retracted articles with at least one scientific content related reason of retraction

only articles (not monographs, chapters) to retain a unified text type

More information about the usage of this dataset will be updated.

*Current retraction status of the citing articles can be different since this is a static dataset and scientific literature is dynamic.
RSNA ATD 2023 DICOM Metadata
kaggle.com
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emmanuel Katchy (2023). RSNA ATD 2023 DICOM Metadata [Dataset]. https://www.kaggle.com/datasets/tobetek/rsna-atd-2023-dicom-metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Emmanuel Katchy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
What is DICOM

DICOM (Digital Imaging and Communications in Medicine) is a standard format used to store and transmit medical images and related information in healthcare settings. It's a widely used format for various types of medical images, including X-rays, MRIs, CT scans, ultrasounds, and more. DICOM files typically contain a wealth of information beyond just the image pixels. This extra data would be wonderful for feature engineering. Here's an overview of the data possibly stored in a DICOM image format (the original RSNA ATD dataset has most likely been purged of PII, and majority of these fields are not present):

Patient Information (Patient's name, Patient's ID, Patient's date of birth etc.)

Study Information (Study description, Study date and time, Study ID etc.)

Series Information:

Series description

Modality (e.g., CT, MRI, X-ray, ultrasound)

Series instance UID (a unique identifier for the series)

Number of images in the series

Image orientation and position information

Image Information:

Image type (e.g., original, derived, etc.)

Photometric interpretation (how pixel values represent image information, e.g., grayscale, RGB)

Rows and columns (image dimensions)

Pixel spacing (physical size of each pixel)

Bits allocated and bits stored (bit depth of pixel values)

High bit (the most significant bit)

Windowing and leveling settings for image display

Rescale intercept and slope (used for converting pixel values to physical units)

Image orientation (patient positioning)

Image Acquisition Details:

Exposure parameters (e.g., radiation dose in radiography, MRI sequence parameters)

Image acquisition date and time

Equipment information (e.g., machine make and model)

Image acquisition technique (e.g., pulse sequence in MRI)

Image Annotations and Markings:

Image Pixel Data: The actual image pixel values, which can be 2D or 3D depending on the image type Encoded in a format such as raw pixel data or compressed image data (e.g., JPEG, JPEG2000)

How can this Dataset be used?

Feature Engineering

3D Visualization of Scan series

Anomaly Detection

Columns in the Dataset

Here's an explanation of each of the fields in the dataset:

SOP Instance UID (Unique Identifier):

A globally unique identifier assigned to each instance (e.g., an individual image or a series) within a DICOM study. It helps identify and distinguish different instances.

Content Date:

The date when the image or data was created or acquired. It's typically in the format YYYYMMDD (year, month, day).

Content Time:

The time when the image or data was created or acquired. It's typically in the format HHMMSS.FFFFFF (hour, minute, second, fraction of a second).

Patient ID:

A unique identifier for the patient, often used to link different studies and images to the same patient.

Slice Thickness:

The thickness of an image slice in millimeters, relevant in three-dimensional imaging modalities like CT scans.

KVP (Kilovolt Peak):

The peak voltage of the X-ray machine used to acquire the image. It affects the quality and contrast of the image.

Patient Position:

The position of the patient during image acquisition, such as supine, prone, standing, etc.

Study Instance UID:

A unique identifier assigned to each study, which may consist of multiple series and images related to a specific medical examination or procedure.

Series Instance UID:

A unique identifier assigned to each series within a study. A series contains a group of related images.

Series Number:

An integer identifier that indicates the position of the series within the study.

Instance Number:

An integer identifier that indicates the position of the image or data instance within a series.

Image Position (Patient):

The position of the image slice within the patient's anatomy, typically defined by three coordinates (x, y, z) in millimeters.

Image Orientation (Patient):

The orientation of the image with respect to the patient's anatomy, typically defined by six parameters that describe the direction cosines of the rows and columns.

Frame of Reference UID:

An identifier that establishes a coordinate system for images within a study, enabling proper alignment and orientation of images in multi-modality studies.

Samples per Pixel:

The number of data samples (e.g., pixels) per image pixel.

Photometric Interpretation:

Describes how pixel data is interpreted for display, such as grayscale, RGB color, o...
h
text-descriptives-metadata
huggingface.co
Updated Oct 15, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla (2013). text-descriptives-metadata [Dataset]. https://huggingface.co/datasets/argilla/text-descriptives-metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 15, 2013
Dataset authored and provided by
Argilla
Description
Dataset Card for text-descriptives-metadata

This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.

Dataset Summary

This dataset contains:

A dataset configuration file conforming to the Argilla dataset format named argilla.yaml. This configuration file will be used to configure the dataset when using the… See the full description on the dataset page: https://huggingface.co/datasets/argilla/text-descriptives-metadata.
Dataset metadata of known Dataverse installations
search.datacite.org
dataverse.harvard.edu
+1more
Updated 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julian Gautier (2019). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/dvn/dcdkzq
Explore at:
Unique identifier
https://doi.org/10.7910/dvn/dcdkzq
Dataset updated
2019
Dataset provided by
DataCitehttps://www.datacite.org/
Harvard Dataverse
Authors
Julian Gautier
Description
This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected this data, 36 installations were running versions of the Dataverse software that allow depositors to choose a license or data use agreement from a dropdown menu in the dataset deposit form. For more information, see https://guides.dataverse.org/en/5.11.1/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations.csv file contains the metadata block names, field names and child field names (if the field is a compound field) of the 77 Dataverse installations' metadata blocks. The metadatablocks_from_most_known_dataverse_installations.csv file is useful for comparing each installation's dataset metadata model (the metadata fields and the metadata blocks that each installation uses). The CSV file was created using a Python script at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_csv_file_with_metadata_block_fields_of_all_installations.py, which takes as inputs the directories and files created by the get_dataset_metadata_of_all_installations.py script. Known errors The metadata of two datasets from one of the known installations could not be downloaded because the datasets' pages and metadata could not be accessed with the Dataverse APIs. About metadata blocks Read about the Dataverse software's metadata blocks system at http://guides.dataverse.org/en/latest/admin/metadatacustomization.html
Z
Metadata of a Large Sonar and Stereo Camera Dataset Suitable for...
data.niaid.nih.gov
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cesar, Diego (2024). Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10373153
Explore at:
Dataset updated
Jul 8, 2024
Dataset provided by
Cesar, Diego
Pribbernow, Max
Wehbe, Bilal
Backe, Christian
Bande, Miguel
Shah, Nimish
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation

Introduction

This is a set of metadata describing a large dataset of synchronized sonar and stereo camera recordings, that were captured between August 2021 and September 2023 during the project DeeperSense (https://robotik.dfki-bremen.de/en/research/projects/deepersense/), as training data for Sonar-to-RGB image translation. Parts of the sensor data have been published (https://zenodo.org/records/7728089, https://zenodo.org/records/10220989). Due to the size of the sensor data corpus, it is currently impractical to make the entire corpus accessible online. Instead, this metadatabase serves as a relatively compact representation, allowing interested researchers to inspect the data, and select relevant portions for their particular use case, which will be made available on demand. This is an effort to comply with the FAIR principle A2 (https://www.go-fair.org/fair-principles/) that metadata shall be accessible, even when the base data is not immediately.

Locations and sensors

The sensor data was captured at four different locations, including one laboratory (Maritime Exploration Hall at DFKI RIC Bremen) and three field locations (Chalk Lake Hemmoor, Tank Wash Basin Neu-Ulm, Lake Starnberg). At all locations, a ZED camera and a Blueprint Oculus M1200d sonar were used. Additionally, a SeaVision camera was used at the Maritime Exploration Hall at DFKI RIC Bremen and at the Chalk Lake Hemmoor. The examples/ directory holds a typical output image for each sensor at each available location.

Data volume per session

Six data collection sessions were conducted. The table below presents an overview of the amount of data captured in each session:

Session dates Location Number of datasets Total duration of datasets [h] Total logfile size [GB] Number of images Total image size [GB]

2021-08-09 - 2021-08-12 Maritime Exploration Hall at DFKI RIC Bremen 52 10.8 28.8 389’047 88.1

2022-02-07 - 2022-02-08 Maritime Exploration Hall at DFKI RIC Bremen 35 4.4 54.1 629’626 62.3

2022-04-26 - 2022-04-28 Chalk Lake Hemmoor 52 8.1 133.6 1’114’281 97.8

2022-06-28 - 2022-06-29 Tank Wash Basin Neu-Ulm 42 6.7 144.2 824’969 26.9

2023-04-26 - 2023-04-27 Maritime Exploration Hall at DFKI RIC Bremen 55 7.4 141.9 739’613 9.6

2023-09-01 - 2023-09-02 Lake Starnberg 19 2.9 40.1 217’385 2.3

255 40.3 542.7 3’914’921 287.0

Data and metadata structure

Sensor data corpus

The sensor data corpus comprises two processing stages:

raw data streams stored in ROS bagfiles (aka logfiles),

camera and sonar images (aka datafiles) extracted from the logfiles.

The files are stored in a file tree hierarchy which groups them by session, dataset, and modality:

${session_key}/ ${dataset_key}/ ${logfile_name} ${modality_key}/ ${datafile_name}

A typical logfile path has this form:

2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ stereo_camera-zed-2023-09-02-15-06-07.bag

A typical datafile path has this form:

2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ zed_right/ 1693660038_368077993.jpg

All directory and file names, and their particles, are designed to serve as identifiers in the metadatabase. Their formatting, as well as the definitions of all terms, are documented in the file entities.json.

Metadatabase

The metadatabase is provided in two equivalent forms:

as a standalone SQLite (https://www.sqlite.org/index.html) database file metadata.sqlite for users familiar with SQLite,

as a collection of CSV files in the csv/ directory for users who prefer other tools.

The database file has been generated from the CSV files, so each database table holds the same information as the corresponding CSV file. In addition, the metadatabase contains a series of convenience views that facilitate access to certain aggregate information.

An entity relationship diagram of the metadatabase tables is stored in the file entity_relationship_diagram.png. Each entity, its attributes, and relations are documented in detail in the file entities.json

Some general design remarks:

For convenience, timestamps are always given in both a human-readable form (ISO 8601 formatted datetime strings with explicit local time zone), and as seconds since the UNIX epoch.

In practice, each logfile always contains a single stream, and each stream is stored always in a single logfile. Per database schema however, the entities stream and logfile are modeled separately, with a “many-streams-to-one-logfile” relationship. This design was chosen to be compatible with, and open for, data collections where a single logfile contains multiple streams.

A modality is not an attribute of a sensor alone, but of a datafile: Because a sensor is an attribute of a stream, and a single stream may be the source of multiple modalities (e.g. RGB vs. grayscale images from the same camera, or cartesian vs. polar projection of the same sonar output). Conversely, the same modality may originate from different sensors.

As a usage example, the data volume per session which is tabulated at the top of this document, can be extracted from the metadatabase with the following SQL query:

SELECT PRINTF( '%s - %s', SUBSTR(session_start, 1, 10), SUBSTR(session_end, 1, 10)) AS 'Session dates', location_name_english AS Location, number_of_datasets AS 'Number of datasets', total_duration_of_datasets_h AS 'Total duration of datasets [h]', total_logfile_size_gb AS 'Total logfile size [GB]', number_of_images AS 'Number of images', total_image_size_gb AS 'Total image size [GB]' FROM location JOIN session USING (location_id) JOIN ( SELECT session_id, COUNT(dataset_id) AS number_of_datasets, ROUND( SUM(dataset_duration) / 3600, 1) AS total_duration_of_datasets_h, ROUND( SUM(total_logfile_size) / 10e9, 1) AS total_logfile_size_gb FROM location JOIN session USING (location_id) JOIN dataset USING (session_id) JOIN view_dataset_total_logfile_size USING (dataset_id) GROUP BY session_id ) USING (session_id) JOIN ( SELECT session_id, COUNT(datafile_id) AS number_of_images, ROUND(SUM(datafile_size) / 10e9, 1) AS total_image_size_gb FROM session JOIN dataset USING (session_id) JOIN stream USING (dataset_id) JOIN datafile USING (stream_id) GROUP BY session_id ) USING (session_id) ORDER BY session_id;
n
OpenScience Slovenia document metadata dataset
narcis.nl
data.mendeley.com
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Borovič, M (via Mendeley Data) (2021). OpenScience Slovenia document metadata dataset [Dataset]. http://doi.org/10.17632/7wh9xvvmgk.3
Explore at:
Unique identifier
https://doi.org/10.17632/7wh9xvvmgk.3
Dataset updated
Mar 9, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Borovič, M (via Mendeley Data)
Area covered
Slovenia
Description
The OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses, research and professional articles, along with other academic document types. The data within the dataset was collected as a part of the establishment of the Slovenian Open-Access Infrastructure which defined a unified document collection process and cataloguing for universities in Slovenia within the infrastructure repositories. The data was collected from several already established but separate library systems in Slovenia and merged into a single metadata scheme using metadata deduplication and merging techniques. It consists of text and numerical fields, representing attributes that describe documents. These attributes include document titles, keywords, abstracts, typologies, authors, issue years and other identifiers such as URL and UDC. The potential of this dataset lies especially in text mining and text classification tasks and can also be used in development or benchmarking of content-based recommender systems on real-world data.
Dataset relating a study on Geospatial Open Data usage and metadata quality
zenodo.org
data.niaid.nih.gov
Updated Jun 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfonso Quarati; Alfonso Quarati; Monica De Martino; Monica De Martino (2023). Dataset relating a study on Geospatial Open Data usage and metadata quality [Dataset]. http://doi.org/10.5281/zenodo.4280594
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4280594
Dataset updated
Jun 19, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alfonso Quarati; Alfonso Quarati; Monica De Martino; Monica De Martino
Description
The Open Government Data portals (OGD) thanks to the presence of thousands of geo-referenced datasets, containing spatial information, are of extreme interest for any analysis or process relating to the territory. For this to happen, users must be enabled to access these datasets and reuse them. An element often considered hindering the full dissemination of OGD data is the quality of their metadata. Starting from an experimental investigation conducted on over 160,000 geospatial datasets belonging to six national and international OGD portals, this work has as its first objective to provide an overview of the usage of these portals measured in terms of datasets views and downloads. Furthermore, to assess the possible influence of the quality of the metadata on the use of geospatial datasets, an assessment of the metadata for each dataset was carried out, and the correlation between these two variables was measured. The results obtained showed a significant underutilization of geospatial datasets and a generally poor quality of their metadata. Besides, a weak correlation was found between the use and quality of the metadata, not such as to assert with certainty that the latter is a determining factor of the former.

The dataset consists of six zipped CSV files, containing the collected datasets' usage data, full metadata, and computed quality values, for about 160,000 geospatial datasets belonging to the three national and three international portals considered in the study, i.e. US (catalog.data.gov), Colombia (datos.gov.co), Ireland (data.gov.ie), HDX (data.humdata.org), EUODP (data.europa.eu), and NASA (data.nasa.gov).

Data collection occurred in the period: 2019-12-19 -- 2019-12-23.

The header for each CSV file is:

[ ,portalid,id,downloaddate,metadata,overallq,qvalues,assessdate,dviews,downloads,engine,admindomain]

where for each row (a portal's dataset) the following fields are defined as follows:

portalid: portal identifier

id: dataset identifier

downloaddate: date of data collection

metadata: the overall dataset's metadata downloaded via API from the portal according to the supporting platform schema

overallq: overall quality values computed by applying the methodology presented in [1]

qvalues: json object containing the quality values computed for the 17 metrics presented in [1]

assessdate: date of quality assessment

dviews: number of total views for the dataset

downloads: number of total downloads for the dataset (made available only by the Colombia, HDX, and NASA portals)

engine: identifier of the supporting portal platform: 1(CKAN), 2 (Socrata)

admindomain: 1 (national), 2 (international)

[1] Neumaier, S.; Umbrich, J.; Polleres, A. Automated Quality Assessment of Metadata Across Open Data Portals.J. Data and Information Quality2016,8, 2:1–2:29. doi:10.1145/2964909
d
US Restaurant POI dataset with metadata
datarade.ai
.csv
Updated Jul 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geolytica (2022). US Restaurant POI dataset with metadata [Dataset]. https://datarade.ai/data-products/us-restaurant-poi-dataset-with-metadata-geolytica
Explore at:
.csvAvailable download formats
Dataset updated
Jul 30, 2022
Dataset authored and provided by
Geolytica
Area covered
United States of America
Description
Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.

This is our process flow:

Our machine learning systems continuously crawl for new POI data Our geoparsing and geocoding calculates their geo locations Our categorization systems cleanup and standardize the datasets Our data pipeline API publishes the datasets on our data store

A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.

POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.

We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.

Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

Data samples may be downloaded at https://store.poidata.xyz/us
e
Reference list of 265 sources used for the discovery of relationships...
b2find.eudat.eu
Updated Jul 9, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2012). Reference list of 265 sources used for the discovery of relationships between data clusters and metadata properties - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/a0005017-41fb-56a6-8893-3e16a460b7a8
Explore at:
Dataset updated
Jul 9, 2012
Description
Visual cluster analysis provides valuable tools that help analysts to understand large data sets in terms of representative clusters and relationships thereof. Often, the found clusters are to be understood in context of belonging categorical, numerical or textual metadata which are given for the data elements. While often not part of the clustering process, such metadata play an important role and need to be considered during the interactive cluster exploration process. Traditionally, linked-views allow to relate (or loosely speaking: correlate) clusters with metadata or other properties of the underlying cluster data. Manually inspecting the distribution of metadata for each cluster in a linked-view approach is tedious, specially for large data sets, where a large search problem arises. Fully interactive search for potentially useful or interesting cluster to metadata relationships may constitute a cumbersome and long process. To remedy this problem, we propose a novel approach for guiding users in discovering interesting relationships between clusters and associated metadata. Its goal is to guide the analyst through the potentially huge search space. We focus in our work on metadata of categorical type, which can be summarized for a cluster in form of a histogram. We start from a given visual cluster representation, and compute certain measures of interestingness defined on the distribution of metadata categories for the clusters. These measures are used to automatically score and rank the clusters for potential interestingness regarding the distribution of categorical metadata. Identified interesting relationships are highlighted in the visual cluster representation for easy inspection by the user. We present a system implementing an encompassing, yet extensible, set of interestingness scores for categorical metadata, which can also be extended to numerical metadata. Appropriate visual representations are provided for showing the visual correlations, as well as the calculated ranking scores. Focusing on clusters of time series data, we test our approach on a large real-world data set of time-oriented scientific research data, demonstrating how specific interesting views are automatically identified, supporting the analyst discovering interesting and visually understandable relationships. The dataset contains 265 links (childs) to any of the BSRN datasets. Any user who accepts the BSRN data release guidelines (http://bsrn.awi.de/data/conditions-of-data-release) may ask Gert König-Langlo (mailto:Gert.Koenig-Langlo@awi.de) to obtain an account to download these datasets.
g
Data warehouse and metadata holdings relevant to Australias North West Shelf...
gimi9.com
Updated Sep 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Data warehouse and metadata holdings relevant to Australias North West Shelf | gimi9.com [Dataset]. https://gimi9.com/dataset/au_data-warehouse-and-metadata-holdings-relevant-to-australias-north-west-shelf/
Explore at:
Dataset updated
Sep 4, 2024
Description
From the earliest stages of planning the North West Shelf Joint Environmental Management Study it was evident that good management of the scientific data to be used in the research would be important for the success of the Study. A comprehensive review of data sets and other information relevant to the marine ecosystems, the geology, infrastructure and industries of the North West Shelf area had been completed (Heyward et al. 2006). The Data Management Project was established to source and prepare existing data sets for use, requiring the development and use of a range of tools: metadata systems, data visualisation and data delivery applications. These were made available to collaborators to allow easy access to data obtained and generated by the Study. The CMAR MarLIN metadata system was used to document the 285 data sets, those which were identified as potentially useful for the Study and the software and information products generated by and for the Study. This report represents a hard copy atlas of all NWSJEMS data products and the existing data sets identified for potential use as inputs to the Study. It comprises summary metadata elements describing the data sets, their custodianship and how the data sets might be obtained. The identifiers of each data set can be used to refer to the full metadata records in the on-line MarLIN system.
g
SAS code used to analyze data and a datafile with metadata glossary |...
gimi9.com
Updated Dec 28, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). SAS code used to analyze data and a datafile with metadata glossary | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_sas-code-used-to-analyze-data-and-a-datafile-with-metadata-glossary
Explore at:
Dataset updated
Dec 28, 2016
Description
We compiled macroinvertebrate assemblage data collected from 1995 to 2014 from the St. Louis River Area of Concern (AOC) of western Lake Superior. Our objective was to define depth-adjusted cutoff values for benthos condition classes (poor, fair, reference) to provide tool useful for assessing progress toward achieving removal targets for the degraded benthos beneficial use impairment in the AOC. The relationship between depth and benthos metrics was wedge-shaped. We therefore used quantile regression to model the limiting effect of depth on selected benthos metrics, including taxa richness, percent non-oligochaete individuals, combined percent Ephemeroptera, Trichoptera, and Odonata individuals, and density of ephemerid mayfly nymphs (Hexagenia). We created a scaled trimetric index from the first three metrics. Metric values at or above the 90th percentile quantile regression model prediction were defined as reference condition for that depth. We set the cutoff between poor and fair condition as the 50th percentile model prediction. We examined sampler type, exposure, geographic zone of the AOC, and substrate type for confounding effects. Based on these analyses we combined data across sampler type and exposure classes and created separate models for each geographic zone. We used the resulting condition class cutoff values to assess the relative benthic condition for three habitat restoration project areas. The depth-limited pattern of ephemerid abundance we observed in the St. Louis River AOC also occurred elsewhere in the Great Lakes. We provide tabulated model predictions for application of our depth-adjusted condition class cutoff values to new sample data. This dataset is associated with the following publication: Angradi, T., W. Bartsch, A. Trebitz, V. Brady, and J. Launspach. A depth-adjusted ambient distribution approach for setting numeric removal targets for a Great Lakes Area of Concern beneficial use impairment: Degraded benthos. JOURNAL OF GREAT LAKES RESEARCH. International Association for Great Lakes Research, Ann Arbor, MI, USA, 43(1): 108-120, (2017).
Data from: Sample Identifiers and Metadata Reporting Format for...
osti.gov
data.ess-dive.lbl.gov
+5more
Updated Dec 31, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agarwal, Deb; Boye, Kristin; Brodie, Eoin; Burrus, Madison; Chadwick, Dana; Cholia, Shreyas; Crystal-Ornelas, Robert; Damerow, Joan; Elbashandy, Hesham; Eloy Alves, Ricardo; Ely, Kim; Goldman, Amy; Hendrix, Valerie; Jones, Christopher; Jones, Matt; Kakalia, Zarine; Kemner, Kenneth; Kersting, Annie; Maher, Kate; Merino, Nancy; O'Brien, Fianna; Perzan, Zach; Robles, Emily; Snavely, Cory; Sorensen, Patrick; Stegen, James; Varadharajan, Charu; Weisenhorn, Pamela; Whitenack, Karen; Zavarin, Mavrik (2019). Sample Identifiers and Metadata Reporting Format for Environmental Systems Science [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/1660470-ess-dive-global-sample-numbers-metadata-reporting-format-environmental-systems-science-igsn-ess
Explore at:
Dataset updated
Dec 31, 2019
Dataset provided by
United States Department of Energyhttp://energy.gov/
Environmental System Science Data Infrastructure for a Virtual Ecosystem; Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE)
Authors
Agarwal, Deb; Boye, Kristin; Brodie, Eoin; Burrus, Madison; Chadwick, Dana; Cholia, Shreyas; Crystal-Ornelas, Robert; Damerow, Joan; Elbashandy, Hesham; Eloy Alves, Ricardo; Ely, Kim; Goldman, Amy; Hendrix, Valerie; Jones, Christopher; Jones, Matt; Kakalia, Zarine; Kemner, Kenneth; Kersting, Annie; Maher, Kate; Merino, Nancy; O'Brien, Fianna; Perzan, Zach; Robles, Emily; Snavely, Cory; Sorensen, Patrick; Stegen, James; Varadharajan, Charu; Weisenhorn, Pamela; Whitenack, Karen; Zavarin, Mavrik
Description
The ESS-DIVE sample identifiers and metadata reporting format primarily follows the System for Earth Sample Registration (SESAR) Global Sample Number (IGSN) guide and template, with modifications to address Environmental Systems Science (ESS) sample needs and practicalities (IGSN-ESS). IGSNs are associated with standardized metadata to characterize a variety of different sample types (e.g. object type, material) and describe sample collection details (e.g. latitude, longitude, environmental context, date, collection method). Globally unique sample identifiers, particularly IGSNs, facilitate sample discovery, tracking, and reuse; they are especially useful when sample data is shared with collaborators, sent to different laboratories or user facilities for analyses, or distributed in different data files, datasets, and/or publications. To develop recommendations for multidisciplinary ecosystem and environmental sciences, we first conducted research on related sample standards and templates. We provide a comparison of existing sample reporting conventions, which includes mapping metadata elements across existing standards and Environment Ontology (ENVO) terms for sample object types and environmental materials. We worked with eight U.S. Department of Energy (DOE) funded projects, including those from Terrestrial Ecosystem Science and Subsurface Biogeochemical Research Scientific Focus Areas. Project scientists tested the process of registering samples for IGSNs and associated metadata in workflows for multidisciplinary ecosystem sciences.more » We provide modified IGSN metadata guidelines to account for needs of a variety of related biological and environmental samples. While generally following the IGSN core descriptive metadata schema, we provide recommendations for extending sample type terms, and connecting to related templates geared towards biodiversity (Darwin Core) and genomic (Minimum Information about any Sequence, MIxS) samples and specimens. ESS-DIVE recommends registering samples for IGSNs through SESAR, and we include instructions for registration using the IGSN-ESS guidelines. Our resulting sample reporting guidelines, template (IGSN-ESS), and identifier approach can be used by any researcher with sample data for ecosystem sciences.« less
o
Data for: Sustainable connectivity in a community repository
explore.openaire.eu
data.niaid.nih.gov
+3more
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ted Habermann (2023). Data for: Sustainable connectivity in a community repository [Dataset]. http://doi.org/10.5061/dryad.nzs7h44xr
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.nzs7h44xr
Dataset updated
Dec 7, 2023
Authors
Ted Habermann
Description
Data For: Sustainable Connectivity in a Community Repository ## GENERAL INFORMATION This readme.txt file was generated on 30231110 by Ted Habermann ### Title of Dataset Data For: Sustainable Connectivity in a Community Repository ### Author Information Principal Investigator Contact Information Name: Ted Habermann (0000-0003-3585-6733) Institution: Metadata Game Changers () Email: ORCID: 0000-0003-3585-6733 ### Date published or finalized for release: November 10, 2023 ## Date of data collection (single date, range, approximate date) May and June 2023 ### Information about funding sources that supported the collection of the data: National Science Foundation (Crossref Funder ID: 100000001) Award 2134956. ### Overview of the data (abstract): These data are Dryad metadata retrieved from and translated into csv files. There are two datasets: 1. DryadJournalDataset was retrieved from Dryad using the ISSNs in the file DryadJournalDataset_ISSNs.txt, although some had no data. 2. DryadOrganizationDataset was retrieved from Dryad using the RORs in the file DryadOrganizationDataset_RORs.txt, although some had no data. Each dataset includes four types of metadata: identifiers, funders, keywords, and related works, each in a separate comma (.csv) or tab (.tsv) delimited files. There are also Microsoft Excel files (.xlsx) for the identifier metadata and connectivity summaries for each dataset (*.html). The connectivity summaries include summaries of each parameter in all four data files with definitions, counts, unique counts, most frequent values, and completeness. These data formed the basis for an analysis of the connectivity of the Dryad repository for organizations, funders, and people. | Size | FileName | | --------: | :--------------------------------------------------------- | | 90541505 | DryadJournalDataset_Identifiers_20230520_12.csv | | 9017051 | DryadJournalDataset_funders_20230520_12.tsv | | 29108477 | DryadJournalDataset_keywords_20230520_12.tsv | | 8833842 | DryadJournalDataset_relatedWorks_20230520_12.tsv | | | | | 18260935 | DryadOrganizationDataset_funders_20230601_12.tsv | | 240128730 | DryadOrganizationDataset_identifiers_20230601_12.tsv | | 39600659 | DryadOrganizationDataset_keywords_20230601_12.tsv | | 11520475 | DryadOrganizationDataset_relatedWorks_20230601_12.tsv | | | | | 40726143 | DryadJournalDataset_identifiers_20230520_12.xlsx | | 81894301 | DryadOrganizationDataset_identifiers_20230601_12.xlsx | | | | | 842827 | DryadJournalDataset_ConnectivitySummary.html | | 387551 | DryadOrganizationDataset_ConnectivitySummary.html | ### Field Definitions ## SHARING/ACCESS INFORMATION ### Licenses/restrictions placed on the data: Creative Commons Public Domain License (CC0) ### Links to publications that cite or use the data: TBD ### Was data derived from another source? No ## DATA & FILE OVERVIEW ### File List A. *Dataset_identifiers_YYYYMMDD_HH.*sv: Short description: Identifier metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. B. *Dataset_funders_YYYYMMDD_HH.*sv: Short description: Funder metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. C. *Dataset_keywords_YYYYMMDD_HH.*sv: Short description: Keyword metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. D. *Dataset_relatedWorks_YYYYMMDD_HH.*sv: Short description: Related work metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. E. *Dataset_identifiers_YYYYMMDD_HH.xlsx: Short description: Excel spreadsheet with identifier metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. F. *Dataset_ConnectivitySummary.html: Short description: Connectivity summary for Dataset. G. summarizeConnectivity.ipynb Short description: Python notebook with code for creating connectivity summaries and plots. ### Relationship between files: All files with the same dataset name make up a dataset. The .*sv are original metadata extracted from Dryad. ## METHODOLOGICAL INFORMATION ### Description of methods used for collection/generation of data: Most of the analysis is simply extracting and comparing counts of various metadata elements. ## DATA-SPECIFIC INFORMATION See connectivity summaries (*ConnectivitySummary.html) for a list of parameters in each file and summaries of their values. ### Identifier Metadata The identifier metadata datasets include the following fields: | Field | Definition | | :------------------------------- | :--------------------------------------------------------------------------------------------------- | | DOI | Digital object identifier for the dataset | | title | Title for the dataset | | datePublished | Date dataset published | | relatedPublicationISSN | International Standard Serial Number for journal with related publication | | primary_article | Digital object identifier for pr...
The NIST Extensible Resource Data Model (NERDm): JSON schemas for rich...
data.nist.gov
s.cnmilf.com
+1more
Updated Sep 2, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2017). The NIST Extensible Resource Data Model (NERDm): JSON schemas for rich description of data resources [Dataset]. http://doi.org/10.18434/mds2-1870
Explore at:
Unique identifier
https://doi.org/10.18434/mds2-1870
Dataset updated
Sep 2, 2017
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Description
The NIST Extensible Resource Data Model (NERDm) is a set of schemas for encoding in JSON format metadata that describe digital resources. The variety of digital resources it can describe includes not only digital data sets and collections, but also software, digital services, web sites and portals, and digital twins. It was created to serve as the internal metadata format used by the NIST Public Data Repository and Science Portal to drive rich presentations on the web and to enable discovery; however, it was also designed to enable programmatic access to resources and their metadata by external users. Interoperability was also a key design aim: the schemas are defined using the JSON Schema standard, metadata are encoded as JSON-LD, and their semantics are tied to community ontologies, with an emphasis on DCAT and the US federal Project Open Data (POD) models. Finally, extensibility is also central to its design: the schemas are composed of a central core schema and various extension schemas. New extensions to support richer metadata concepts can be added over time without breaking existing applications. Validation is central to NERDm's extensibility model. Consuming applications should be able to choose which metadata extensions they care to support and ignore terms and extensions they don't support. Furthermore, they should not fail when a NERDm document leverages extensions they don't recognize, even when on-the-fly validation is required. To support this flexibility, the NERDm framework allows documents to declare what extensions are being used and where. We have developed an optional extension to the standard JSON Schema validation (see ejsonschema below) to support flexible validation: while a standard JSON Schema validater can validate a NERDm document against the NERDm core schema, our extension will validate a NERDm document against any recognized extensions and ignore those that are not recognized. The NERDm data model is based around the concept of resource, semantically equivalent to a schema.org Resource, and as in schema.org, there can be different types of resources, such as data sets and software. A NERDm document indicates what types the resource qualifies as via the JSON-LD "@type" property. All NERDm Resources are described by metadata terms from the core NERDm schema; however, different resource types can be described by additional metadata properties (often drawing on particular NERDm extension schemas). A Resource contains Components of various types (including DCAT-defined Distributions) that are considered part of the Resource; specifically, these can include downloadable data files, hierachical data collecitons, links to web sites (like software repositories), software tools, or other NERDm Resources. Through the NERDm extension system, domain-specific metadata can be included at either the resource or component level. The direct semantic and syntactic connections to the DCAT, POD, and schema.org schemas is intended to ensure unambiguous conversion of NERDm documents into those schemas. As of this writing, the Core NERDm schema and its framework stands at version 0.7 and is compatible with the "draft-04" version of JSON Schema. Version 1.0 is projected to be released in 2025. In that release, the NERDm schemas will be updated to the "draft2020" version of JSON Schema. Other improvements will include stronger support for RDF and the Linked Data Platform through its support of JSON-LD.
The Red Queen in the Repository: metadata quality in an ever-changing...
zenodo.org
researchdata.se
bin, csv, zip
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joakim Philipson; Joakim Philipson (2024). The Red Queen in the Repository: metadata quality in an ever-changing environment (preprint of paper, presentation slides and dataset collection with validation schemas to IDCC2019 conference paper) [Dataset]. http://doi.org/10.5281/zenodo.2276777
Explore at:
zip, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2276777
Dataset updated
Jul 25, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Joakim Philipson; Joakim Philipson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This fileset contains a preprint version of the conference paper (.pdf), presentation slides (as .pptx) and the dataset(s) and validation schema(s) for the IDCC 2019 (Melbourne) conference paper: The Red Queen in the Repository: metadata quality in an ever-changing environment. Datasets and schemas are in .xml, .xsd , Excel (.xlsx) and .csv (two files representing two different sheets in the .xslx -file). The validationSchemas.zip holds the additional validation schemas (.xsd), that were not found in the schemaLocations of the metadata xml-files to be validated. The schemas must all be placed in the same folder, and are to be used for validating the Dataverse dcterms records (with metadataDCT.xsd) and the Zenodo oai_datacite feeds respectively (schema.datacite.org_oai_oai-1.0_oai.xsd). In the latter case, a simpler way of doing it might be to replace the incorrect URL "http://schema.datacite.org/oai/oai-1.0/ oai_datacite.xsd" in the schemaLocation of these xml-files by the CORRECT: schemaLocation="http://schema.datacite.org/oai/oai-1.0/ http://schema.datacite.org/oai/oai-1.0/oai.xsd" as has been done already in the sample files here. The sample file folders testDVNcoll.zip (Dataverse), testFigColl.zip (Figshare) and testZenColl.zip (Zenodo) contain all the metadata files tested and validated that are registered in the spreadsheet with objectIDs.
In the case of Zenodo, one original file feed,
zen2018oai_datacite3orig-https%20_zenodo.org_oai2d%20verb=ListRecords%26metadata
Prefix=oai_datacite%26from=2018-11-29%26until=2018-11-30.xml ,
is also supplied to show what was necessary to change in order to perform validation as indicated in the paper.

For Dataverse, a corrected version of a file,
dvn2014ddi-27595Corr_https%20_dataverse.harvard.edu_api_datasets_export%20
exporter=ddi%26persistentId=doi%253A10.7910_DVN_27595Corr.xml ,
is also supplied in order to show the changes it would take to make the file validate without error.
m
sample
data.mendeley.com
Updated Feb 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kaavya kaavya (2024). sample [Dataset]. http://doi.org/10.17632/ft7ctmb7yh.1
Explore at:
Unique identifier
https://doi.org/10.17632/ft7ctmb7yh.1
Dataset updated
Feb 5, 2024
Authors
kaavya kaavya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Describe your research hypothesis, what your data shows, any notable findings and how the data can be interpreted. Please add sufficient description to enable others to understand what the data is, how it was gathered and how to interpret and use it.
d
Location Identifiers, Metadata, and Map for Field Measurements at the East...
dataone.org
knb.ecoinformatics.org
+3more
Updated Oct 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charuleka Varadharajan; Madison Burrus; Dylan O'Ryan; Zarine Kakalia; Erek Alper; Jillian Banfield; Max Berkelhammer; Curtis Beutler; Eoin Brodie; Wendy Brown; Mariah S. Carbone; Rosemary Carroll; Danielle Christianson; Chunwei Chou; Robert Crystal-Ornelas; K. Dana Chadwick; John Christensen; Baptiste Dafflon; Gijs de Boer; Hesham Elbashandy; Brian J. Enquist; Daniel Feldman; Patricia Fox; Benjamin Gilbert; David Gochis; Matthew Henderson; Douglas Johnson; Lara Kueppers; Langlang Li; Paula Matheus Carnevali; Alexander Newman; Thomas Powell; Kamini Singha; Patrick Sorensen; Matthias Sprenger; Tetsu Tokunaga; Roelof Versteeg; Mike Wilkins; Kenneth Williams; Marshall Worsham; Catherine Wong; Yuxin Wu; Damao Zhang; Deborah Agarwal (2023). Location Identifiers, Metadata, and Map for Field Measurements at the East River Watershed, Colorado, USA (Version 3.1) [Dataset]. http://doi.org/10.15485/1660962
Explore at:
Unique identifier
https://doi.org/10.15485/1660962
Dataset updated
Oct 24, 2023
Dataset provided by
ESS-DIVE
Authors
Charuleka Varadharajan; Madison Burrus; Dylan O'Ryan; Zarine Kakalia; Erek Alper; Jillian Banfield; Max Berkelhammer; Curtis Beutler; Eoin Brodie; Wendy Brown; Mariah S. Carbone; Rosemary Carroll; Danielle Christianson; Chunwei Chou; Robert Crystal-Ornelas; K. Dana Chadwick; John Christensen; Baptiste Dafflon; Gijs de Boer; Hesham Elbashandy; Brian J. Enquist; Daniel Feldman; Patricia Fox; Benjamin Gilbert; David Gochis; Matthew Henderson; Douglas Johnson; Lara Kueppers; Langlang Li; Paula Matheus Carnevali; Alexander Newman; Thomas Powell; Kamini Singha; Patrick Sorensen; Matthias Sprenger; Tetsu Tokunaga; Roelof Versteeg; Mike Wilkins; Kenneth Williams; Marshall Worsham; Catherine Wong; Yuxin Wu; Damao Zhang; Deborah Agarwal
Time period covered
Sep 14, 2015 - Oct 10, 2023
Area covered

Description
This dataset contains identifiers, metadata, and a map of the locations where field measurements have been conducted at the East River Community Observatory located in the Upper Colorado River Basin, United States. This is version 3.1 of the dataset and replaces the prior version 3.0 (see below for details on changes between the versions). Dataset description: The East River is the primary field site of the Watershed Function Scientific Focus Area (WFSFA) and the Rocky Mountain Biological Laboratory. Researchers from several institutions generate highly diverse hydrological, biogeochemical, climate, vegetation, geological, remote sensing, and model data at the East River in collaboration with the WFSFA. Thus, the purpose of this dataset is to maintain an inventory of the field locations and instrumentation to provide information on the field activities in the East River and coordinate data collected across different locations, researchers, and institutions. The dataset contains (1) a README file with information on the various files, (2) three csv files describing the metadata collected for each surface point location, plot and region registered with the WFSFA, (3) csv files with metadata and contact information for each surface point location registered with the WFSFA, (4) a csv file with with metadata and contact information for plots, (5) a csv file with metadata for geographic regions and sub-regions within the watershed, (6) a compiled xlsx file with all the data and metadata which can be opened in Microsoft Excel, (7) a kml map of the locations plotted in the watershed which can be opened in Google Earth, (8) a jpeg image of the kml map which can be viewed in any photo viewer, and (9) a zipped file with the registration templates used by the SFA team to collect location metadata. The zipped template file contains two csv files with the blank templates (point and plot), two csv files with instructions for filling out the location templates, and one compiled xlsx file with the instructions and blank templates together. Additionally, the templates in the xlsx include drop down validation for any controlled metadata fields. Persistent location identifiers (Location_ID) are determined by the WFSFA data management team and are used to track data and samples across locations. Dataset uses: This location metadata is used to update the Watershed SFA’s publicly accessible Field Information Portal (an interactive field sampling metadata exploration tool; https://wfsfa-data.lbl.gov/watershed/), the kml map file included in this dataset, and other data management tools internal to the Watershed SFA team. Version Information: The latest version of this dataset publication is version 3.1. This version contains a total of 101 new point locations and 1 new geographic region. Overall, there are a total of 1111 point locations, 62 plots, and 36 geographic regions. Additionally, the kml map of locations and image now includes a Taylor River geographic region boundary and stream network. Refer to methods for further details on the version history. This dataset will be updated on a periodic basis with new measurement location information. Researchers interested in having their East River measurement locations added in this list should reach out to the WFSFA data management team at wfsfa-data@googlegroups.com. Acknowledgements: Please cite this dataset if using any of the location metadata in other publications or derived products. If using the location metadata for the NEON hyperspectral campaign, additionally cite Chadwick et al. (2020). doi:10.15485/1618130.
s
Survey on FAIRness of CFReDS Portal datasets' metadata - 2022
swissubase.ch
doi.org
Updated Aug 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Survey on FAIRness of CFReDS Portal datasets' metadata - 2022 [Dataset]. http://doi.org/10.48657/k4vr-7z49
Explore at:
Unique identifier
https://doi.org/10.48657/k4vr-7z49
Dataset updated
Aug 10, 2024
Description
This dataset consists of a database (.SQL format) containing the result of the analysis of metadata of 212 datasets in the Computer Forensic Reference DataSet Portal (CFReDS, NIST). The survey that led to this dataset, carried out by Samuele Mombelli between 21 December 2022 and 28 January 2023, focused on analyzing the metadata associated with these datasets and assessing their compliance with the FAIR Principles (Findabiliy, Accessibility, Interoperability and Reusability). The data were collected using a specially developed checklist that encapsulates a set of criteria representing its own implementation of the FAIR principles. This dataset is linked to the publication in the following article: PUBLICATION IN PROGRESS. Further details on the criteria used and the structure of the data can be found in the documentation associated with the database.

MD5 checksum of the SQL database: FBC41CFB9FF8F4CB1BE08D779DA7EB56 SHA-256 checksum of the SQL database: 23498F297F42CDC3F058A2804FE5092DCA8087BC2BC718C6DCB377B5B1207154
c
Overview Metadata for the Data used in te Conceptual and Numerical Model of...
s.cnmilf.com
datasets.ai
+2more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Overview Metadata for the Data used in te Conceptual and Numerical Model of the Colorado River (1990-2016) [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/overview-metadata-for-the-data-used-in-te-conceptual-and-numerical-model-of-the-color-1990
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Colorado River
Description
This data release contains six different datasets that were used in the report SIR 2018-5108. These datasets contain discharge data, discrete dissolved-solids data, quality-control discrete dissolved data, and computed mean dissolved solids data that were collected at various locations between the Hoover Dam and the Imperial Dam. Study Sites: Site 1: Colorado River below Hoover Dam Site 2: Bill Williams River near Parker Site 3: Colorado River below Parker Dam Site 4: CRIR Main Canal Site 5: Palo Verde Canal Site 6: Colorado River at Palo Verde Dam Site 7: CRIR Lower Main Drain Site 8: CRIR Upper Levee Drain Site 9: PVID Outfall Drain Site 10: Colorado River above Imperial Dam Discrete Dissolved-solids Dataset and Replicate Samples for Discrete Dissolved-solids Dataset: The Bureau of Reclamation collected discrete water-quality samples for the parameter of dissolved-solids (sum of constituents). Dissolved-solids, measured in milligrams per liter, are the sum of the following constituents: bicarbonate, calcium, carbonate, chloride, fluoride, magnesium, nitrate, potassium, silicon dioxide, sodium, and sulfate. These samples were collected on a monthly to bimonthly basis at various time periods between 1990 and 2016 at Sites 1-5 and Sites 7-10. No data were collected for Site 6: Colorado River at Palo Verde Dam. The Bureau of Reclamation and the USGS collected discrete quality-control replicate samples for the parameter of dissolved-solids, sum of constituents measured in milligrams per liter. The USGS collected discrete quality-control replicate samples in 2002 and 2003 and the Bureau of Reclamation collected discrete quality-control replicate samples in 2016 and 2017. Listed below are the sites where these samples were collected at and which agency collected the samples. Site 3: Colorado River below Parker Dam: USGS and Reclamation Site 4: CRIR Main Canal: Reclamation Site 5: Palo Verde Canal: Reclamation Site 7: CRIR Lower Main Drain: Reclamation Site 8: CRIR Upper Levee Drain: Reclamation Site 9: PVID Outfall Drain: Reclamation Site 10: Colorado River above Imperial Dam: USGS and Reclamation Monthly Mean Datasets and Mean Monthly Datasets: Monthly mean discharge data (cfs), flow weighted monthly mean dissolved-solids concentrations (mg/L) data and monthly mean dissolved-solids load data from 1990 to 2016 were computed using raw data from the USGS and the Bureau of Reclamation. This data were computed for all 10 sites. Flow weighted monthly mean dissolved-solids concentration and monthly mean dissolved-solids load were not computed for Site 2: Bill Williams River near Parker. The monthly mean datasets that were calculated for each month for the period between 1990 and 2016 were used to compute the mean monthly discharge and the mean monthly dissolved-solids load for each of the 12 months within a year. Each monthly mean was weighted by how many days were in the month and then averaged for each of the twelve months. This was computed for all 10 sites except mean monthly dissolved-solids load were not computed at Site 2: Bill Williams River near Parker. Site 8a: Colorado River between Parker and Palo Verde Valleys was computed by summing the data from sites 6, 7 and 8. Bill Williams Daily Mean Discharge, Instantaneous Dissolved-solids Concentration, and Daily Means Dissolved-solids Load Dataset: Daily mean discharge (cfs), instantaneous solids concentration (mg/L), and daily mean dissolved solids load were calculated using raw data collected by the USGS and the Bureau of Reclamation. This data were calculated for Site 2: Bill Williams River near Parker for the period of January 1990 to February 2016. Palo Verde Irrigation District Outfall Drain Mean Daily Discharge Dataset: The Bureau of Reclamation collected mean daily discharge data for the period of 01/01/2005 to 09/30/2016 at the Palo Verde Irrigation District (PVID) outfall drain using a stage-discharge relationship.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kevin Read (2016). Common Metadata Elements for Cataloging Biomedical Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1496573.v1

Common Metadata Elements for Cataloging Biomedical Datasets

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.1496573.v1

Dataset updated

Jan 20, 2016

Dataset provided by

Figsharehttp://figshare.com/
figshare

Authors

Kevin Read

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.

Clear search

Close search

Google apps

Main menu

Common Metadata Elements for Cataloging Biomedical Datasets

A Dataset of Metadata of Articles Citing Retracted Articles

RSNA ATD 2023 DICOM Metadata

What is DICOM

How can this Dataset be used?

Columns in the Dataset

text-descriptives-metadata

Dataset metadata of known Dataverse installations

Metadata of a Large Sonar and Stereo Camera Dataset Suitable for...

OpenScience Slovenia document metadata dataset

Dataset relating a study on Geospatial Open Data usage and metadata quality

US Restaurant POI dataset with metadata

Reference list of 265 sources used for the discovery of relationships...

Data warehouse and metadata holdings relevant to Australias North West Shelf...

SAS code used to analyze data and a datafile with metadata glossary |...

Data from: Sample Identifiers and Metadata Reporting Format for...

Data for: Sustainable connectivity in a community repository

The NIST Extensible Resource Data Model (NERDm): JSON schemas for rich...

The Red Queen in the Repository: metadata quality in an ever-changing...

sample

Location Identifiers, Metadata, and Map for Field Measurements at the East...

Survey on FAIRness of CFReDS Portal datasets' metadata - 2022

Overview Metadata for the Data used in te Conceptual and Numerical Model of...

Common Metadata Elements for Cataloging Biomedical Datasets