23 datasets found

m
Synthesis methods data dictionary
bridges.monash.edu
researchdata.edu.au
pdf
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miranda Cumpston; Sue Brennan; Joanne McKenzie; Rebecca Ryan (2023). Synthesis methods data dictionary [Dataset]. http://doi.org/10.26180/20785948.v3
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.26180/20785948.v3
Dataset updated
Jan 27, 2023
Dataset provided by
Monash University
Authors
Miranda Cumpston; Sue Brennan; Joanne McKenzie; Rebecca Ryan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data dictionary describes the coding system applied to the data extracted from systematic reviews included in the paper:

Cumpston MS, Brennan SE, Ryan R, McKenzie JE. 2023. Statistical synthesis methods other than meta-analysis are commonly used but seldom specified: survey of systematic reviews of interventions

Associated files: 1. Synthesis methods data file: Cumpston_et_al_2023_other_synthesis_methods.xlsx (https://doi.org/10.26180/20785396) 2. Synthesis methods Stata code: Cumpston_et_al_2023_other_synthesis_methods.do (https://doi.org/10.26180/20786251) 3. Study protocol: Cumpston MS, McKenzie JE, Thomas J and Brennan SE. The use of ‘PICO for synthesis’ and methods for synthesis without meta-analysis: protocol for a survey of current practice in systematic reviews of health interventions. F1000Research 2021, 9:678. (https://doi.org/10.12688/f1000research.24469.2)
m
Synthesis methods data file:...
bridges.monash.edu
researchdata.edu.au
xlsx
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miranda Cumpston; Sue Brennan; Joanne McKenzie; Rebecca Ryan (2023). Synthesis methods data file: Cumpston_et_al_2023_other_synthesis_methods.xlsx [Dataset]. http://doi.org/10.26180/20785396.v5
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.26180/20785396.v5
Dataset updated
Jan 27, 2023
Dataset provided by
Monash University
Authors
Miranda Cumpston; Sue Brennan; Joanne McKenzie; Rebecca Ryan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file includes the data extracted and coded from systematic reviews included in the paper: "Cumpston MS, Brennan SE, Ryan R, McKenzie JE. 2023. Statistical synthesis methods other than meta-analysis are commonly used but seldom specified: survey of systematic reviews of interventions"

Associated files: 1. Synthesis methods data dictionary (https://doi.org/10.26180/20785948) 2. Synthesis methods Stata code: Cumpston_et_al_2023_other_synthesis_methods.do (https://doi.org/10.26180/20786251) 3. Study protocol: Cumpston MS, McKenzie JE, Thomas J and Brennan SE. The use of ‘PICO for synthesis’ and methods for synthesis without meta-analysis: protocol for a survey of current practice in systematic reviews of health interventions. F1000Research 2021, 9:678. (https://doi.org/10.12688/f1000research.24469.2)

Note: Naming convention of the variables. The naming convention for the variables links to the data dictionary. The character prefix identifies the section of the data_directory (e.g. variables names with the prefix 'Chars' are from the 'CHARACTERISTICS' section). The number of the variable reflects the item number in the data dictionary, except that the first digit is removed because this is captured by the character prefix. For example, Chars_2 is item number 1.2 under the 'CHARACTERISTICS' section of the data dictionary.
CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis
zenodo.org
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Puneet Kumar; Puneet Kumar; Sarthak Malik; Sarthak Malik; Balasubramanian Raman; Balasubramanian Raman; Xiaobai Li; Xiaobai Li (2025). CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis [Dataset]. http://doi.org/10.5281/zenodo.11409612
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11409612
Dataset updated
May 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Puneet Kumar; Puneet Kumar; Sarthak Malik; Sarthak Malik; Balasubramanian Raman; Balasubramanian Raman; Xiaobai Li; Xiaobai Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 1, 2024
Description
Overview
The Controllable Multimodal Feedback Synthesis (CMFeed) Dataset is designed to enable the generation of sentiment-controlled feedback from multimodal inputs, including text and images. This dataset can be used to train feedback synthesis models in both uncontrolled and sentiment-controlled manners. Serving a crucial role in advancing research, the CMFeed dataset supports the development of human-like feedback synthesis, a novel task defined by the dataset's authors. Additionally, the corresponding feedback synthesis models and benchmark results are presented in the associated code and research publication.

Task Uniqueness: The task of controllable multimodal feedback synthesis is unique, distinct from LLMs and tasks like VisDial, and not addressed by multi-modal LLMs. LLMs often exhibit errors and hallucinations, as evidenced by their auto-regressive and black-box nature, which can obscure the influence of different modalities on the generated responses [Ref1; Ref2]. Our approach includes an interpretability mechanism, as detailed in the supplementary material of the corresponding research publication, demonstrating how metadata and multimodal features shape responses and learn sentiments. This controllability and interpretability aim to inspire new methodologies in related fields.

Data Collection and Annotation
Data was collected by crawling Facebook posts from major news outlets, adhering to ethical and legal standards. The comments were annotated using four sentiment analysis models: FLAIR, SentimentR, RoBERTa, and DistilBERT. Facebook was chosen for dataset construction because of the following factors:
• Facebook was chosen for data collection because it uniquely provides metadata such as news article link, post shares, post reaction, comment like, comment rank, comment reaction rank, and relevance scores, not available on other platforms.
• Facebook is the most used social media platform, with 3.07 billion monthly users, compared to 550 million Twitter and 500 million Reddit users. [Ref]
• Facebook is popular across all age groups (18-29, 30-49, 50-64, 65+), with at least 58% usage, compared to 6% for Twitter and 3% for Reddit. [Ref]. Trends are similar for gender, race, ethnicity, income, education, community, and political affiliation [Ref]
• The male-to-female user ratio on Facebook is 56.3% to 43.7%; on Twitter, it's 66.72% to 23.28%; Reddit does not report this data. [Ref]

Filtering Process: To ensure high-quality and reliable data, the dataset underwent two levels of filtering:
a) Model Agreement Filtering: Retained only comments where at least three out of the four models agreed on the sentiment.
b) Probability Range Safety Margin: Comments with a sentiment probability between 0.49 and 0.51, indicating low confidence in sentiment classification, were excluded.
After filtering, 4,512 samples were marked as XX. Though these samples have been released for the reader's understanding, they were not used in training the feedback synthesis model proposed in the corresponding research paper.

Dataset Description
• Total Samples: 61,734
• Total Samples Annotated: 57,222 after filtering.
• Total Posts: 3,646
• Average Likes per Post: 65.1
• Average Likes per Comment: 10.5
• Average Length of News Text: 655 words
• Average Number of Images per Post: 3.7

Components of the Dataset
The dataset comprises two main components:
• CMFeed.csv File: Contains metadata, comment, and reaction details related to each post.
• Images Folder: Contains folders with images corresponding to each post.

Data Format and Fields of the CSV File
The dataset is structured in CMFeed.csv file along with corresponding images in related folders. This CSV file includes the following fields:
• Id: Unique identifier
• Post: The heading of the news article.
• News_text: The text of the news article.
• News_link: URL link to the original news article.
• News_Images: A path to the folder containing images related to the post.
• Post_shares: Number of times the post has been shared.
• Post_reaction: A JSON object capturing reactions (like, love, etc.) to the post and their counts.
• Comment: Text of the user comment.
• Comment_like: Number of likes on the comment.
• Comment_reaction_rank: A JSON object detailing the type and count of reactions the comment received.
• Comment_link: URL link to the original comment on Facebook.
• Comment_rank: Rank of the comment based on engagement and relevance.
• Score: Sentiment score computed based on the consensus of sentiment analysis models.
• Agreement: Indicates the consensus level among the sentiment models, ranging from -4 (all negative) to 4 (all positive). 3 negative and 1 positive will result into -2 and 3 positives and 1 negative will result into +2.
• Sentiment_class: Categorizes the sentiment of the comment into 1 (positive) or 0 (negative).

More Considerations During Dataset Construction
We thoroughly considered issues such as the choice of social media platform for data collection, bias and generalizability of the data, selection of news handles/websites, ethical protocols, privacy and potential misuse before beginning data collection. While achieving completely unbiased and fair data is unattainable, we endeavored to minimize biases and ensure as much generalizability as possible. Building on these considerations, we made the following decisions about data sources and handling to ensure the integrity and utility of the dataset:

• Why not merge data from different social media platforms? We chose not to merge data from platforms such as Reddit and Twitter with Facebook due to the lack of comprehensive metadata, clear ethical guidelines, and control mechanisms—such as who can comment and whether users' anonymity is maintained—on these platforms other than Facebook. These factors are critical for our analysis. Our focus on Facebook alone was crucial to ensure consistency in data quality and format.

• Choice of four news handles: We selected four news handles—BBC News, Sky News, Fox News, and NY Daily News—to ensure diversity and comprehensive regional coverage. These news outlets were chosen for their distinct regional focuses and editorial perspectives: BBC News is known for its global coverage with a centrist view, Sky News offers geographically targeted and politically varied content learning center/right in the UK/EU/US, Fox News is recognized for its right-leaning content in the US, and NY Daily News provides left-leaning coverage in New York. Many other news handles such as NDTV, The Hindu, Xinhua, and SCMP are also large-scale but may contain information in regional languages such as Indian and Chinese, hence, they have not been selected. This selection ensures a broad spectrum of political discourse and audience engagement.

• Dataset Generalizability and Bias: With 3.07 billion of the total 5 billion social media users, the extensive user base of Facebook, reflective of broader social media engagement patterns, ensures that the insights gained are applicable across various platforms, reducing bias and strengthening the generalizability of our findings. Additionally, the geographic and political diversity of these news sources, ranging from local (NY Daily News) to international (BBC News), and spanning political spectra from left (NY Daily News) to right (Fox News), ensures a balanced representation of global and political viewpoints in our dataset. This approach not only mitigates regional and ideological biases but also enriches the dataset with a wide array of perspectives, further solidifying the robustness and applicability of our research.

• Dataset size and diversity: Facebook prohibits the automatic scraping of its users' personal data. In compliance with this policy, we manually scraped publicly available data. This labor-intensive process requiring around 800 hours of manual effort, limited our data volume but allowed for precise selection. We followed ethical protocols for scraping Facebook data , selecting 1000 posts from each of the four news handles to enhance diversity and reduce bias. Initially, 4000 posts were collected; after preprocessing (detailed in Section 3.1), 3646 posts remained. We then processed all associated comments, resulting in a total of 61734 comments. This manual method ensures adherence to Facebook’s policies and the integrity of our dataset.

Ethical considerations, data privacy and misuse prevention
The data collection adheres to Facebook’s ethical guidelines [<a href="https://developers.facebook.com/terms/"
Z
OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated...
data.niaid.nih.gov
Updated May 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Tan (2022). OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6399454
Explore at:
Dataset updated
May 13, 2022
Dataset provided by
Animesh Basak Chowdhury
Siddharth Garg
Ramesh Karri
Benjamin Tan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Logic synthesis is a challenging and widely-researched combinatorial optimization problem during integrated circuit (IC) design. It transforms a high-level description of hardware in a programming language like Verilog into an optimized digital circuit netlist, a network of interconnected Boolean logic gates, that implements the function. Spurred by the success of ML in solving combinatorial and graph problems in other domains, there is growing interest in the design of ML-guided logic synthesis tools. Yet, there are no standard datasets or prototypical learning tasks defined for this problem domain. Here, we describe OpenABC-D,a large-scale, labeled dataset produced by synthesizing open source designs with a leading open-source logic synthesis tool and illustrate its use in developing, evaluating and benchmarking ML-guided logic synthesis. OpenABC-D has intermediate and final outputs in the form of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs plus labels such as the optimized node counts, and de-lay. We define a generic learning problem on this dataset and benchmark existing solutions for it. The codes related to dataset creation and benchmark models are available athttps://github.com/NYU-MLDA/OpenABC.git.
d
Data from: Global dataset of nitrogen fixation rates across inland and...
researchdiscovery.drexel.edu
portal.edirepository.org
Updated Sep 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robinson W Fulweiler; Megan E Berberich; Shelby A Rinehart; Jason M Taylor; Michelle Catherine Kelly; Nicholas E Ray; Autumn Oczkowski; Mar Benavides; Matthew J Church; Brianna Loeks; Silvia Newell; Malin Olofsson; Jimmy Clifford Oppong; Sarah S Roley; Carmella Vizza; Samuel T Wilson; J Thad Scott; Amy M Marcarelli (2024). Global dataset of nitrogen fixation rates across inland and coastal waters based on a coordinated synthesis effort [Dataset]. https://researchdiscovery.drexel.edu/esploro/outputs/dataset/Global-dataset-of-nitrogen-fixation-rates/991021902494404721
Explore at:
Dataset updated
Sep 17, 2024
Dataset provided by
Environmental Data Initiative
Authors
Robinson W Fulweiler; Megan E Berberich; Shelby A Rinehart; Jason M Taylor; Michelle Catherine Kelly; Nicholas E Ray; Autumn Oczkowski; Mar Benavides; Matthew J Church; Brianna Loeks; Silvia Newell; Malin Olofsson; Jimmy Clifford Oppong; Sarah S Roley; Carmella Vizza; Samuel T Wilson; J Thad Scott; Amy M Marcarelli
Time period covered
2024
Description
Biological nitrogen fixation converts inert di-nitrogen gas into bioavailable nitrogen and can be an important source of bioavailable nitrogen to organisms. This dataset synthesizes the aquatic nitrogen fixation rate measurements across inland and coastal waters. Data were derived from papers and datasets published by April 2022 and include rates measured using the acetylene reduction assay (ARA), 15N2 labeling, or the N2/Ar technique. The dataset is comprised of 4793 nitrogen fixation rates measurements from 267 studies, and is structured into four tables: 1) a reference table with sources from which data were extracted, 2) a rates table with nitrogen fixation rates that includes habitat, substrate, geographic coordinates, and method of measuring N2 fixation rates, 3) a table with supporting environmental and chemical data for a subset of the rate measurements when data were available, and 4) a data dictionary with definitions for each variable in each data table. This dataset was compiled and curated by the NSF-funded Aquatic Nitrogen Fixation Research Coordination Network (award number 2015825).
P
CocoChorales Dataset
paperswithcode.com
Updated Sep 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yusong Wu; Josh Gardner; Ethan Manilow; Ian Simon; Curtis Hawthorne; Jesse Engel (2022). CocoChorales Dataset [Dataset]. https://paperswithcode.com/dataset/cocochorales
Explore at:
Dataset updated
Sep 27, 2022
Authors
Yusong Wu; Josh Gardner; Ethan Manilow; Ian Simon; Curtis Hawthorne; Jesse Engel
Description
The CocoChorales Dataset CocoChorales is a dataset consisting of over 1400 hours of audio mixtures containing four-part chorales performed by 13 instruments, all synthesized with realistic-sounding generative models. CocoChorales contains mixes, sources, and MIDI data, as well as annotations for note expression (e.g., per-note volume and vibrato) and synthesis parameters (e.g., multi-f0).

Dataset We created CocoChorales using two generative models produced by Magenta: Coconet and MIDI-DDSP. The dataset was created in two stages. First, we used a trained Coconet model to generate a large set of four-part chorales in the style of J.S. Bach. The output of this first stage is a set of note sequences, stored as MIDI, to which we assign a tempo and add random timing variations to each note (for added realism).

In the second stage, we use MIDI-DDSP to synthesize these MIDI files into audio, resulting in audio clips that sound like the chorales were performed by live musicians. This MIDI-DDSP model was trained on URMP. We define a set of ensembles that consist of the following instruments, in Soprano, Alto, Tenor, Bass (SATB) order:

String Ensemble: Violin 1, Violin 2, Viola, Cello.

Brass Ensemble: Trumpet, French Horn, Trombone, Tuba.

Woodwind Ensemble: Flute, Oboe, Clarinet, Bassoon.

Random Ensemble: Each SATB part is randomly assigned an instrument according to the following:

Soprano: Violin, Flute, Trumpet, Clarinet, Oboe.

Alto: Violin, Viola, Flute, Clarinet, Oboe, Saxophone, Trumpet, French Horn.

Tenor: Viola, Cello, Clarinet, Saxophone, Trombone, French Horn.

Bass: Cello, Double Bass, Bassoon, Tuba.

Each instrument in the ensemble is synthesized separately, with annotations for the high-level expressions used for each note (e.g., vibrato, note volume, note brightness, etc; all expressions shown here, and more details in Sections 3.2 and B.3 of the MIDI-DDSP paper) as well as detailed low-level annotations for synthesis parameters (e.g., f₀’s, amplitudes of each harmonic, etc). Because the MIDI-DDSP model skews sharp, we randomly applied pitch augmentation to the f₀’s (see Figure 2, here) to . All four audio clips for each instrument in the ensemble are then mixed together to produce an example in the dataset.

Because all of the data in CocoChorales originate from generative models, all of the annotations perfectly correspond to the audio data. All in all, the dataset contains 240,000 examples, 60,000 mixes from each one of the four ensemble types above. Each ensemble has its own train/validation/test split All of the audio is 16 kHz, 16-bit PCM data. Each example contains:

A mixture

Source audio for all four instruments

Gain applied to each source

MIDI with tempo and precise timing

The name of the ensemble with instrument names

Note expression annotations for every note:

Volume, Volume Fluctuation, Volume Peak Position, Vibrato, Brightness, and Attack Noise used by MIDI-DDSP to synthesize every note (see Sections 3.2 and B.3 of the MIDI-DDSP paper for more details)

Synthesis parameters for every source (250 Hz):

Fundamental frequency (f₀), amplitude, amplitude of all harmonics, filtered noise parameters

Amount of pitch augmentation applied

Further Details A detailed view of the contents of the CocoChorales dataset is provided at this link.

Download For download instructions, please see this github page. The compressed version of the full dataset is 2.9 Tb, and the uncompressed version is larger than 4 Tb. There is a "tiny" version for download as well.

MD5 Hashes for all zipped files in the download are provided here.

License The CocoChorales dataset was made by Yusong Wu and is available under the Creative Commons Attribution 4.0 International (CC-BY 4.0).

How to Cite If you use CocoChorales in your work, we ask that you cite the following paper where it was introduced:

Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne, and Jesse Engel. “The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling.” arXiv preprint, arXiv:2209.14458, 2022.

You can also use the following bibtex entry:

@article{wu2022chamber, title = {The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling}, author = {Wu, Yusong and Gardner, Josh and Manilow, Ethan and Simon, Ian and Hawthorne, Curtis and Engel, Jesse}, journal={arXiv preprint arXiv:2209.14458}, year = {2022}, }

SynthRAD2023 Grand Challenge dataset: synthetizing computed tomography for...

zenodo.org

pdf, zip

Updated Jul 15, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Adrian Thummerer; Adrian Thummerer; Erik van der Bijl; Erik van der Bijl; Matteo Maspero; Matteo Maspero (2024). SynthRAD2023 Grand Challenge dataset: synthetizing computed tomography for radiotherapy [Dataset]. http://doi.org/10.5281/zenodo.7260705

Explore at:

zip, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7260705

Dataset updated

Jul 15, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Adrian Thummerer; Adrian Thummerer; Erik van der Bijl; Erik van der Bijl; Matteo Maspero; Matteo Maspero

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

DATASET STRUCTURE

The dataset can be downloaded from https://doi.org/10.5281/zenodo.7260705 and a detailed description is offered at "synthRAD2023_dataset_description.pdf".

The training datasets for Task1 is in Task1.zip, while for Task2 in Task2.zip. After unzipping, each Task is organized according to the following folder structure:

Task1.zip/

├── Task1

├── brain

├── 1Bxxxx

├── mr.nii.gz

├── ct.nii.gz

└── mask.nii.gz

├── ...

└── overview

├── 1_brain_train.xlsx

├── 1Bxxxx_train.png

└── ...

└── pelvis

├── 1Pxxxx

├── mr.nii.gz

├── ct.nii.gz

├── mask.nii.gz

├── ...

└── overview

├── 1_pelvis_train.xlsx

├── 1Pxxxx_train.png

└── ....

Task2.zip/

├──Task2

├── brain

├── 2Bxxxx

├── cbct.nii.gz

├── ct.nii.gz

└── mask.nii.gz

├── ...

└── overview

├── 2_brain_train.xlsx

├── 2Bxxxx_train.png

└── ...

└── pelvis

├── 2Pxxxx

├── cbct.nii.gz

├── ct.nii.gz

├── mask.nii.gz

├── ...

└── overview

├── 2_pelvis_train.xlsx

├── 2Pxxxx_train.png

└── ....

Each patient folder has a unique name that contains information about the task, anatomy, center and a patient ID. The naming follows the convention below:

[Task] [Anatomy] [Center] [PatientID]

1 B A 001

In each patient folder, three files can be found:

ct.nii.gz: CT image
mr.nii.gz or cbct.nii.gz (depending on the task): CBCT/MR image
mask.nii.gz:image containing a binary mask of the dilated patient outline

For each task and anatomy, an overview folder is provided which contains the following files:

[task]_[anatomy]_train.xlsx: This file contains information about the image acquisition protocol for each patient.
[task][anatomy][center][PatientID]_train.png: For each patient a png showing axial, coronal and sagittal slices of CBCT/MR, CT, mask and the difference between CBCT/MR and CT is provided. These images are meant to provide a quick visual overview of the data.

DATASET DESCRIPTION

This challenge dataset contains imaging data of patients who underwent radiotherapy in the brain or pelvis region. Overall, the population is predominantly adult and no gender restrictions were considered during data collection. For Task 1, the inclusion criteria were the acquisition of a CT and MRI during treatment planning while for task 2, acquisitions of a CT and CBCT, used for patient positioning, were required. Datasets for task 1 and 2 do not necessarily contain the same patients, given the different image acquisitions for the different tasks.

Data was collected at 3 Dutch university medical centers:

Radboud University Medical Center
University Medical Center Utrecht
University Medical Center Groningen

For anonymization purposes, from here on, institution names are substituted with A, B and C, without specifying which institute each letter refers to.

The following number of patients is available in the training set.

Training

	Brain	Pelvis
	Center A	Center B	Center C	Total	Center A	Center B	Center C	Total
Task 1	60	60	60	180	120	0	60	180
Task 2	60	60	60	180	60	60	60	180

Each subset generally contains equal amounts of patients from each center, except for task 1 brain, where center B had no MR scans available. To compensate for this, center A provided twice the number of patients than in other subsets.

Validation

	Brain	Pelvis
	Center A	Center B	Center C	Total	Center A	Center B	Center C	Total
Task 1	10	10	10	30	20	0	10	30
Task 2	10	10	10	30	10	10	10	30

Testing

	Brain	Pelvis
	Center A	Center B	Center C	Total	Center A	Center B	Center C	Total
Task 1	20	20	20	60	40	0	20	60
Task 2	20	20	20	60	20	20	20	60

In total, for all tasks and anatomies combined, 1080 image pairs (720 training, 120 validation, 240 testing) are available in this dataset. This repository only contains the training data.

All images were acquired with the clinically used scanners and imaging protocols of the respective centers and reflect typical images found in clinical routine. As a result, imaging protocols and scanner can vary between patients. A detailed description of the imaging protocol for each image, can be found in spreadsheets that are part of the dataset release (see dataset structure).

Data was acquired with the following scanners:

Center A:
- MRI: Philips Ingenia 1.5T/3.0T
- CT: Philips Brilliance Big Bore or Siemens Biograph20 PET-CT
- CBCT: Elekta XVI
Center B:
- MRI: Siemens MAGNETOM Aera 1.5T or MAGNETOM Avanto_fit 1.5T
- CT: Siemens SOMATOM Definition AS
- CBCT: IBA Proteus+ or Elekta XVI
Center C:
- MRI: Siemens Avanto fit 1.5T or Siemens MAGNETOM Vida fit 3.0T
- CT: Philips Brilliance Big Bore
- CBCT: Elekta XVI

For task 1, MRIs were acquired with a T1-weighted gradient echo or an inversion prepared - turbo field echo (TFE) sequence and collected along with the corresponding planning CTs for all subjects. The exact acquisition parameters vary between patients and centers. For centers B and C, selected MRIs were acquired with Gadolinium contrast, while the selected MRIs of center A were acquired without contrast.

For task 2, the CBCTs used for image-guided radiotherapy ensuring accurate patient position were selected for all subjects along with the corresponding

c
Preliminary digital data for a 3-layer geologic model of the conterminous...
s.cnmilf.com
data.usgs.gov
+1more
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Preliminary digital data for a 3-layer geologic model of the conterminous United States using land surface, top of bedrock, and top of basement [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/preliminary-digital-data-for-a-3-layer-geologic-model-of-the-conterminous-united-states-us
Explore at:
Dataset updated
Feb 22, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Contiguous United States, United States
Description
This digital dataset compiles a 3-layer geologic model of the conterminous United States by mapping the altitude of three surfaces: land surface, top of bedrock, and top of basement. These surfaces are mapped through the compilation and synthesis of published stratigraphic horizons from numerous topical studies. The mapped surfaces create a 3-layer geologic model with three geomaterials-based subdivisions: unconsolidated to weakly consolidated sediment; layered consolidated rock strata that constitute bedrock, and crystalline basement, consisting of either igneous, metamorphic, or highly deformed rocks. Compilation of subsurface data from published reports involved standard techniques within a geographic information system (GIS) including digitizing contour lines, gridding the contoured data, sampling the resultant grids at regular intervals, and attribution of the dataset. However, data compilation and synthesis is highly dependent on the definition of the informal terms “bedrock” and “basement”, terms which may describe different ages or types of rock in different places. The digital dataset consists of a single polygon feature class which contains an array of square polygonal cells that are 2.5 km m in x and y dimensions. These polygonal cells multiple attributes including x-y _location, altitude of the three mapped layers at each x-y _location, the published data source from which each surface altitude was compiled, and an attribute that allows for spatially varying definitions of the bedrock and basement units. The spatial data are linked through unique identifiers to non-spatial tables that describe the sources of geologic information and a glossary of terms used to describe bedrock and basement type.
C
Conserved Areas Explorer
data.cnra.ca.gov
data.ca.gov
+5more
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Natural Resources Agency (2024). Conserved Areas Explorer [Dataset]. https://data.cnra.ca.gov/dataset/conserved-areas-explorer
Explore at:
html, arcgis geoservices rest apiAvailable download formats
Dataset updated
Jul 7, 2025
Dataset provided by
CA Nature Organization
Authors
California Natural Resources Agency
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
California Nature Conserved Areas Explorer
The Conserved Areas Explorer is a web application enabling users to investigate a synthesis of the best available data representing lands and coastal waters of California that are durably protected and managed to support functional ecosystems, both intact and restored, and the species that rely on them. Understanding the spatial distribution and extent of these durably protected and managed areas is a vital aspect of tracking and achieving the “30x30” goal of conserving 30% of California's lands and waters by 2030.

Terrestrial and Freshwater Data
• The California Protected Areas Database (CPAD), developed and managed by GreenInfo Network, is the most comprehensive collection of data on open space in California. CPAD data consists of Holdings, a single parcel or group of parcels, such that the spatial features of CPAD correspond to ownership boundaries.
• The California Conservation Easement Database (CCED), also managed by GreenInfo Network, aggregates data on lands with easements. Conservation Easements are legally recorded interests in land in which a landholder sells or relinquishes certain development rights to their land in perpetuity. Easements are often used to ensure that lands remain as open space, either as working farm or ranch lands, or areas for biodiversity protection. Easement restrictions typically remain with the land through changes in ownership.
• The Protected Areas Database of the United States (PAD-US), hosted by the United States Geological Survey (USGS), is developed in coordination with multiple federal, state, and non-governmental organization (NGO) partners. PAD-US, through the Gap Analysis Project (GAP), uses a numerical coding system in which GAP codes 1 and 2 correspond to management strategies with explicit emphasis on protection and enhancement of biodiversity. PAD-US is not specifically aligned to parcel boundaries and as such, boundaries represented within it may not align with other data sources.
• Numerous datasets representing designated boundaries for entities such as National Parks , and Monuments, Wild and Scenic Rivers, Wilderness Areas, and others, were downloaded from publicly available sources, typically hosted by the managing agency.

Methodology
1. CPAD and CCED represent the most accurate location and ownership information for parcels in California which contribute to the preservation of open space and cultural and biological resources.
2. Superunits are collections of parcels (Holdings) within CPAD which share a name, manager, and access policy. Most Superunits are also managed with a generally consistent strategy for biodiversity conservation. Examples of Superunits include Yosemite National Park, Giant Sequoia National Monument, and Anza-Borrego Desert State Park.
3. Some Superunits, such as those owned and managed by the Bureau of Land Management, U.S. Forest Service, or National Park Service , are intersected by one or more designations, each of which may have a distinct management emphasis with regards to biodiversity. Examples of such designations are Wilderness Areas, Wild and Scenic Rivers, or National Monuments.
4. CPAD Superunits were intersected with all designation boundary files to create the operative spatial units for conservation analysis, henceforth 'Conservation Units,' which make up the Conserved Areas Map Layer. Each easement was functionally considered to be a Superunit.
5. Each Conservation Unit was intersected with the PAD-US dataset in order to determine the management emphasis with respect to biodiversity, i.e., the GAP code. Because PAD-US is national in scope and not specifically parcel aligned with California assessors' surveys, a direct spatial extraction of GAP codes from PAD-US would leave tens of thousands of GAP code data slivers within the Conserved Areas Map. Consequently, a generalizing approach was adopted, such that any Conservation Unit with greater than 80% areal overlap with a single GAP code was uniformly assigned that code. Additionally, the total area of GAP codes 1 and 2 were summed for the remaining uncoded Conservation Units. If this sum was greater than 80% of the unit area, the Conservation Unit was coded as GAP 2.
6. Subsequent to this stage of analysis, certain Conservation Units remained uncoded, either due to the lack of a single GAP code (or combined GAP codes 1&2) overlapping 80% of the area, or because the area was not sufficiently represented in the PAD-US dataset.
7. These uncoded Conservation Units were then broken down into their constituent, finer resolution Holdings, which were then analyzed according to the above workflow.
8. Areas remaining uncoded following the two-step process of coding at the Superunit and Holding levels were assigned a GAP code of 4. This is consistent with the definition of GAP Code 4: areas unknown to have a biodiversity management focus.
9. Greater than 90% of all areas in the Conserved Areas Explorer were GAP coded at the level of Superunits intersected by designation boundaries, the coarsest unit of analysis. By adopting this coarser analytical unit, the Conserved Areas Explorer maintains a greater level of user responsiveness, avoiding the need to maintain and display hundreds of thousands of additional parcel records, which in most cases would only reflect the management scenario and GAP status of the umbrella Superunit and other spatially coincident designations.

Marine Data
• The Conserved Areas Explorer displays the network of 124 Marine Protected Areas (MPAs) along coastal waters and the shoreline of California. There are several categories of MPAs, some permitting varying levels of commercial and recreational fishing and waterfowl hunting, while roughly half of all MPAs do not permit any harvest. These data include all of California's marine protected areas (MPAs) as defined January 1, 2019. This dataset reflects the Department of Fish and Wildlife's best representation of marine protected areas based upon current California Code of Regulations, Title 14, Section 632: Natural Resources, Division 1: FGC- DFG. This dataset is not intended for navigational use or defining legal boundaries.

Tracking Conserved Areas
The total acreage of conserved areas will increase as California works towards its 30x30 goal. Some changes will be due to shifts in legal protection designations or management status of specific lands and waters. However, shifts may also result from new data representing improvements in our understanding of existing biodiversity conservation efforts. The California Nature Conserved Areas Explorer is expected to generate a great deal of excitement regarding the state's trajectory towards achieving the 30x30 goal. We also expect it to spark discussion about how to shape that trajectory, and how to strategize and optimize outcomes. We encourage landowners, managers, and stakeholders to zoom into the locations they understand best and share their expertise with us to improve the data representing the status of conservation efforts at these sites. The Conserved Areas Explorer presents a tremendous opportunity to strengthen our existing data infrastructure and the channels of communication between land stewards and data curators, encouraging the transfer of knowledge and improving the quality of data.

CPAD, CCED, and PAD-US are built from the ground up. These terrestrial data sources are derived from available parcel information and submissions from those who own and manage the land. So better data starts with you. Do boundary lines require updating? Is the GAP code inconsistent with a Holding’s conservation status? If land under your care can be better represented in the Conserved Areas Explorer, please use this link to initiate a review. The results of these reviews will inform updates to the California Protected Areas Database, California Conservation Easement Database, and PAD-US as appropriate for incorporation into future updates to CA Nature and tracking progress to 30x30.
d
Data from: Disentangling the relative contributions of factors determining...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Mar 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
La-Mei Wu; Si-Chong Chen; Rui-chang Quan; Bo Wang (2024). Disentangling the relative contributions of factors determining seed physical defence: a global-scale data synthesis [Dataset]. http://doi.org/10.5061/dryad.0k6djhb51
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.0k6djhb51
Dataset updated
Mar 12, 2024
Dataset provided by
Dryad Digital Repository
Authors
La-Mei Wu; Si-Chong Chen; Rui-chang Quan; Bo Wang
Time period covered
Jun 12, 2023
Description
Physical defence investment in seeds varies greatly among plant species and is associated with many potential factors. Exploring the factors explaining the interspecific variation in physical defence has long attracted particular attention in both ecology and evolution studies. However, the relative importance of the factors has not yet been quantitatively evaluated, which may lead to the misunderstanding of the main driver generating such interspecific variation. Here, by compiling a global database of the seed coat ratio (SCR), a proxy of seed physical defence, for 1,362 species, we provided the first quantification of the relative explanations of six factors that have been commonly considered to be associated with the interspecific variation in SCR: seed mass, seed desiccation response (desiccation-sensitive vs. desiccation-tolerant), seed dormancy (nondormant, physical dormant or other dormant types), growth form (herbaceous vs. woody), fruit type (dry vs. fleshy), and climate (19 b..., DataÂ on the SCR and seed mass were collected from the literature published up until June 5, 2022. We used ((diaspore$ or â€œseed* coat$â€ or (seed$ and (kernel$ or reserve$)) or (seed$ andÂ â€œembryo* endosperm$â€ )) and (ratio$ or proportion$ or fraction$ or tissue$ or percent* or weigh* or mass*)) or (seed$ and (defen* or protect*) and (physical* or mechanical*)) as search terms in the ISI Web of Science and further restricted our search to be consistent with the â€˜Study Field Categoriesâ€™ of Wu et al. (2019), with the additional category â€˜anatomy morphologyâ€™. Google Scholar was also searched with the same keywords to expand our dataset. Finally, 9,322 journal papers written in English were search out. We first screened all the papers based on the titles and abstracts and excluded 6,857 studies that were not relevant to our focused question. For the remaining 2,465 papers, we screened the full texts and finally yielded 85 papers containing available SCR data collected from 203 sampling sites. T...,

seed desiccation response Â 1 means desiccation-tolerant, 0 means desiccation-sensitive, blanks mean lack of information.

growth form Â 1 means woody, and 0 means herbaceous.

fruit type Â 1 means fleshy, 0 means dry, blanks mean lack of information.

nondormant Â 1 means nondormant, 0 means dormant, blanks mean lack of information.

physical dormant Â 1 means physical dormant, 0 means non physical dormant, blanks mean lack of information.

other dormant Â 1 means other dormant (include physiological dormancy, morphological dormancy and morphophysiological dormancy), 0 means nondormant or physical dormant, blanks mean lack of information.

, # Title of Dataset:

Data from: Disentangling the relative contributions of factors determining seed physical defence: a global-scale data synthesis

Description of the Data and file structure

Dataset of SCR.xlsx Excel spreadsheet which includes species information, seed coat ratio, and factors that have been commonly considered as possible drivers of variation in the SCR: seed mass, seed desiccation response, growth form, fruit type, climate, and seed dormancy (nondormant, physical dormant or other dormant types).
f
Data from: De Novo Synthesis of the DEF-Ring Stereotriad Core of the...
figshare.com
acs.figshare.com
txt
Updated Apr 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew A. Horwitz; Jacob G. Robins; Jeffrey S. Johnson (2020). De Novo Synthesis of the DEF-Ring Stereotriad Core of the Veratrum Alkaloids [Dataset]. http://doi.org/10.1021/acs.joc.0c00685.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.joc.0c00685.s001
Dataset updated
Apr 30, 2020
Dataset provided by
ACS Publications
Authors
Matthew A. Horwitz; Jacob G. Robins; Jeffrey S. Johnson
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The synthesis of the stereotriad core in the eastern portion of the Veratrum alkaloids jervine (1), cyclopamine (2), and veratramine (3) is reported. Starting from a known β-methyltyrosine derivative (8), the route utilizes a diastereoselective substrate-controlled 1,2-reduction to establish the stereochemistry of the vicinal amino alcohol motif embedded within the targets. Oxidative dearomatization is demonstrated to be a viable approach for the synthesis of the spirocyclic DE ring junction found in jervine and cyclopamine.
f
What is meant by “multimodal therapy” for aphasia? (Pierce et al., 2019)
asha.figshare.com
mp4
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John E. Pierce; Robyn O’Halloran; Leanne Togher; Miranda L. Rose (2023). What is meant by “multimodal therapy” for aphasia? (Pierce et al., 2019) [Dataset]. http://doi.org/10.23641/asha.7646717.v2
Explore at:
mp4Available download formats
Unique identifier
https://doi.org/10.23641/asha.7646717.v2
Dataset updated
May 31, 2023
Dataset provided by
ASHA journals
Authors
John E. Pierce; Robyn O’Halloran; Leanne Togher; Miranda L. Rose
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Video abstractPurpose: Multimodal therapy is a frequent term in aphasia literature, but it has no agreed upon definition. Phrases such as “multimodal therapy” and “multimodal treatment” are applied to a range of aphasia interventions as if mutually understood, and yet, the interventions reported in the literature differ significantly in methodology, approach, and aims. This inconsistency can be problematic for researchers, policy makers, and clinicians accessing the literature and potentially compromises data synthesis and meta-analysis. A literature review was conducted to examine what types of aphasia treatment are labeled multimodal and determine whether any patterns are present.Method: A systematic search was conducted to identify literature pertaining to aphasia that included the term multimodal therapy (and variants). Sources included literature databases, dissertation databases, textbooks, professional association websites, and Google Scholar.Results: Thirty-three original review articles were identified, as well as another 31 sources referring to multimodal research, all of which used a variant of the term multimodal therapy. Treatments had heterogeneous aims, underlying theories, and methods. The rationale for using more than 1 modality was not always clear, nor was the reason each therapy was considered to be multimodal when similar treatments had not used the title. Treatments were noted to differ across 2 key features. The 1st was whether the ultimate aim of intervention was to improve total communication, as in augmentative and alternative communication approaches, or to improve 1 specific modality, as when gesture is used to improve word retrieval. The 2nd was the point in the treatment that the nonspeech modalities were employed.Discussion: Our review demonstrated that references to “multimodal” treatments represent very different therapies with little consistency. We propose a framework to define and categorize multimodal treatments, which is based both on our results and on current terminology in speech-language pathology.Supplemental Material S1. Secondary sources referring to multimodal treatments. Supplemental Material S2. Data extraction table for original research on "multimodal therapy."Pierce, J. E., O'Halloran, R., Togher, L., & Rose, M. L. (2019). What is meant by "multimodal therapy" for aphasia? American Journal of Speech-Language Pathology, 28, 706–716. https://doi.org/10.1044/2018_AJSLP-18-0157
d
Dissolved inorganic carbon, alkalinity, pH, temperature, salinity, and other...
catalog.data.gov
data.wu.ac.at
Updated Feb 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact) (2025). Dissolved inorganic carbon, alkalinity, pH, temperature, salinity, and other variables collected from profile observations using CTD, discrete bottles, and other instruments from October 7, 1977 to March 11, 2006, as synthesized in the CARbon dioxide IN the Atlantic Ocean (CARINA) Database (NCEI Accession 0113899) [Dataset]. https://catalog.data.gov/dataset/dissolved-inorganic-carbon-alkalinity-ph-temperature-salinity-and-other-variables-collected-fro
Explore at:
Dataset updated
Feb 1, 2025
Dataset provided by
(Point of Contact)
Area covered
Atlantic Ocean
Description
The CARINA (CARbon dioxide IN the Atlantic Ocean) data synthesis project is an international collaborative effort of the EU IP CARBOOCEAN, and US partners. It has produced a merged internally consistent data set of open ocean subsurface measurements for biogeochemical investigations, in particular, studies involving the carbon system. The original focus area was the North Atlantic Ocean, but over time the geographic extent expanded and CARINA now includes data from the entire Atlantic, the Arctic Ocean, and the Southern Ocean. Atlantic Ocean Data Synthesis The Atlantic Ocean subset of the CARINA (CARINA-ATL) data set consists of 98 cruises/entries, of which one of is a time series including many cruises and two others are collections of multiple cruises within the framework of a common project. Additionally, six reference cruises were included in the secondary QC for CARINA-ATL to ensure consistency between CARINA and historical databases, in particular Global Ocean Data Analysis Project (GLODAP, Key et al., 2004). Five Atlantic cruises are in common with the Southern Ocean region, and five others are in common with the Arctic Mediterranean Seas region. These overlapping cruises ensure consistency between the three regions of the CARINA data set. The Atlantic Ocean region of CARINA is loosely defined as the area between of the Greenland-Scotland Ridge and 30 Â°S, but as mentioned, ten cruises overlap with the surrounding regions, thus extending the area covered. Most of the data are from the subpolar North Atlantic, and there are particularly large data gaps in the Tropical and South-Eastern Atlantic Ocean. The CARINA-ATL database covers the time period from 1978 to 2006, with the majority of the data from the mid 1990s to the mid 2000s. Overall, oxygen measurements show the highest incidence, followed by TCO2, Alkalinity and CFC data, although CFC data are particularly abundant for some specific regions. Arctic Mediterranean Seas Data Synthesis The Arctic Mediterranean Seas subset of CARINA (CARINA-AMS) includes data from 62 cruises/campaigns in the Arctic Ocean and Nordic Seas. One of these is a time series and one is a collection of data from multiple cruises to the same area conducted within a year. Five of the CARINA-AMS entries are in common with the CARINA-ATL subset ensuring consistency with the other CARINA subsets and thus GLODAP. While data coverage was quite dense in the Nordic Seas, it was sparse in the Arctic Ocean. This motivated the use of different methods for quality control in these two areas. The Arctic Ocean was defined as the region north of the Fram and Bering Straits, the Arctic Ocean shelf seas, and the Canadian Archipelago. The Nordic Seas was defined as the region enclosed by the Fram Strait to the North, Greenland to the west, the Greenland-Scotland Ridge to the South, and Norway, the Barents Sea Opening, and Spitsbergen to the east. The analyses of the Arctic Ocean data involved extended use of linear and multiple linear regressions and is described by JutterstrÃ¶m et al. (2009), while the analyses of the Nordic Seas data was mostly carried out using the crossover and inversion approach and is described per parameter in Falck and Olsen (2009), Olafsson and Olsen (2009), Olsen (2009), Olsen (2009) and Olsen et al. (2009). The analyses of the AMS CFC data are described by Jeansson et al. (2009). Southern Ocean Carbon Synthesis Compared to other regions within the CARINA data set, the Southern Ocean database consists of relatively few data - 37 cruises. Five cruises in the northern part of the Atlantic sector of the Southern Ocean are in common with the CARINA-ATL, thus additionally warranting high internal data quality. The northern boundary of the CARINA Southern Ocean region is roughly at 30Â°S latitude. Considering all stations in the Southern Ocean CARINA dataset there is a bias towards the north, indicating that data close to the Antarctic continent is still sparse. Besides the new CARINA cruises, 46 cruises from the GLODAP database were incorporated in the analysis as reference cruises. Nutrient and oxygen data have a clearly higher incidence than TCO2 and total alkalinity data. Chlorofluorocarbons (CFCs) are also included in the Southern Ocean dataset, but they have not been quality controlled. Not surprisingly, most of the CARINA Southern Ocean data originate from the post-GLODAP era, i.e., from 2000 or later. Region specific quality control is described in three papers, for the Pacific sector by Sabine et al. (2009), the Indian sector by Lo Monaco et al. (2009) and the Atlantic sector by Hoppema et al. (2009).
d
Spatially explicit estimates of stock size, structure and biomass of North...
search.dataone.org
doi.pangaea.de
Updated Jan 19, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lehodey, Patrick; Senina, Inna; Dragon, Anne-Cécile; Arrizabalaga, Haritz; Collecte Localisation Satellites (2018). Spatially explicit estimates of stock size, structure and biomass of North Atlantic albacore tuna (Thunnus alalunga) in the North Atlantic for the period 1987-2005, compiled from statistics about ICCAT fishery region B13 [Dataset]. http://doi.org/10.1594/PANGAEA.828237
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.828237
Dataset updated
Jan 19, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Lehodey, Patrick; Senina, Inna; Dragon, Anne-Cécile; Arrizabalaga, Haritz; Collecte Localisation Satellites
Time period covered
Jul 15, 1987 - Oct 15, 2005
Area covered

Description
The development of the ecosystem approach and models for the management of ocean marine resources requires easy access to standard validated datasets of historical catch data for the main exploited species. They are used to measure the impact of biomass removal by fisheries and to evaluate the models skills, while the use of standard dataset facilitates models inter-comparison. North Atlantic albacore tuna is exploited all year round by longline and in summer and autumn by surface fisheries and fishery statistics compiled by the International Commission for the Conservation of Atlantic Tunas (ICCAT). Catch and effort with geographical coordinates at monthly spatial resolution of 1° or 5° squares were extracted for this species with a careful definition of fisheries and data screening. In total, thirteen fisheries were defined for the period 1956-2010, with fishing gears longline, troll, mid-water trawl and bait fishing. However, the spatialized catch effort data available in ICCAT database represent a fraction of the entire total catch. Length frequencies of catch were also extracted according to the definition of fisheries above for the period 1956-2010 with a quarterly temporal resolution and spatial resolutions varying from 1°x 1° to 10°x 20°. The resolution used to measure the fish also varies with size-bins of 1, 2 or 5 cm (Fork Length). The screening of data allowed detecting inconsistencies with a relatively large number of samples larger than 150 cm while all studies on the growth of albacore suggest that fish rarely grow up over 130 cm. Therefore, a threshold value of 130 cm has been arbitrarily fixed and all length frequency data above this value removed from the original data set.
E
Artificial Personality
find.data.gov.scot
dtechtive.com
csv, pdf, txt, zip
Updated Jun 4, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh, School of Informatics, Centre for Speech Technology Research (2015). Artificial Personality [Dataset]. http://doi.org/10.7488/ds/254
Explore at:
zip(14.49 MB), csv(0.0064 MB), zip(16.45 MB), csv(0.0691 MB), pdf(0.1354 MB), txt(0.0326 MB), txt(0.0166 MB), zip(0.0015 MB), txt(0.0031 MB), txt(0.0023 MB), zip(16.57 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/254
Dataset updated
Jun 4, 2015
Dataset provided by
University of Edinburgh, School of Informatics, Centre for Speech Technology Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is associated with the paper 'Artificial Personality and Disfluency' by Mirjam Wester, Matthew Aylett, Marcus Tomalin and Rasmus Dall published at Interspeech 2015, Dresden. The focus of this paper is artificial voices with different personalities. Previous studies have shown links between an individual's use of disfluencies in their speech and their perceived personality. Here, filled pauses (uh and um) and discourse markers (like, you know, I mean) have been included in synthetic speech as a way of creating an artificial voice with different personalities. We discuss the automatic insertion of filled pauses and discourse markers (i.e., fillers) into otherwise fluent texts. The automatic system is compared to a ground truth of human ``acted' filler insertion. Perceived personality (as defined by the big five personality dimensions) of the synthetic speech is assessed by means of a standardised questionnaire. Synthesis without fillers is compared to synthesis with either spontaneous or synthetic fillers. Our findings explore how the inclusion of disfluencies influences the way in which subjects rate the perceived personality of an artificial voice.
d
Protected Areas Database of the United States (PAD-US) 2.1 - World Database...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 2.1 - World Database on Protected Areas (WDPA) Submission (ver. 1.1, April 2021) [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-2-1-world-database-on-protected-areas
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
The United States Geological Survey (USGS) - Science Analytics and Synthesis (SAS) - Gap Analysis Project (GAP) manages the Protected Areas Database of the United States (PAD-US), an Arc10x geodatabase, that includes a full inventory of areas dedicated to the preservation of biological diversity and to other natural, recreation, historic, and cultural uses, managed for these purposes through legal or other effective means (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/protected-areas). The PAD-US is developed in partnership with many organizations, including coordination groups at the [U.S.] Federal level, lead organizations for each State, and a number of national and other non-governmental organizations whose work is closely related to the PAD-US. Learn more about the USGS PAD-US partners program here: www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-data-stewards. The United Nations Environmental Program - World Conservation Monitoring Centre (UNEP-WCMC) tracks global progress toward biodiversity protection targets enacted by the Convention on Biological Diversity (CBD) through the World Database on Protected Areas (WDPA) and World Database on Other Effective Area-based Conservation Measures (WD-OECM) available at: www.protectedplanet.net. See the Aichi Target 11 dashboard (www.protectedplanet.net/en/thematic-areas/global-partnership-on-aichi-target-11) for official protection statistics recognized globally and developed for the CBD, or here for more information and statistics on the United States of America's protected areas: www.protectedplanet.net/country/USA. It is important to note statistics published by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas (MPA) Center (www.marineprotectedareas.noaa.gov/dataanalysis/mpainventory/) and the USGS-GAP (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-statistics-and-reports) differ from statistics published by the UNEP-WCMC as methods to remove overlapping designations differ slightly and U.S. Territories are reported separately by the UNEP-WCMC (e.g. The largest MPA, "Pacific Remote Islands Marine Monument" is attributed to the United States Minor Outlying Islands statistics). At the time of PAD-US 2.1 publication (USGS-GAP, 2020), NOAA reported 26% of U.S. marine waters (including the Great Lakes) as protected in an MPA that meets the International Union for Conservation of Nature (IUCN) definition of biodiversity protection (www.iucn.org/theme/protected-areas/about). USGS-GAP plans to publish PAD-US 2.1 Statistics and Reports in the spring of 2021. The relationship between the USGS, the NOAA, and the UNEP-WCMC is as follows: - USGS manages and publishes the full inventory of U.S. marine and terrestrial protected areas data in the PAD-US representing many values, developed in collaboration with a partnership network in the U.S. and; - USGS is the primary source of U.S. marine and terrestrial protected areas data for the WDPA, developed from a subset of the PAD-US in collaboration with the NOAA, other agencies and non-governmental organizations in the U.S., and the UNEP-WCMC and; - UNEP-WCMC is the authoritative source of global protected area statistics from the WDPA and WD-OECM and; - NOAA is the authoritative source of MPA data in the PAD-US and MPA statistics in the U.S. and; - USGS is the authoritative source of PAD-US statistics (including areas primarily managed for biodiversity, multiple uses including natural resource extraction, and public access). The PAD-US 2.1 Combined Marine, Fee, Designation, Easement feature class (GAP Status Code 1 and 2 only) is the source of protected areas data in this WDPA update. Tribal areas and military lands represented in the PAD-US Proclamation feature class as GAP Status Code 4 (no known mandate for biodiversity protection) are not included as spatial data to represent internal protected areas are not available at this time. The USGS submitted more than 42,900 protected areas from PAD-US 2.1, including all 50 U.S. States and 6 U.S. Territories, to the UNEP-WCMC for inclusion in the May 2021 WDPA, available at www.protectedplanet.net. The NOAA is the sole source of MPAs in PAD-US and the National Conservation Easement Database (NCED, www.conservationeasement.us/) is the source of conservation easements. The USGS aggregates authoritative federal lands data directly from managing agencies for PAD-US (www.communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/), while a network of State data-stewards provide state, local government lands, and some land trust preserves. National nongovernmental organizations contribute spatial data directly (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-data-stewards). The USGS translates the biodiversity focused subset of PAD-US into the WDPA schema (UNEP-WCMC, 2019) for efficient aggregation by the UNEP-WCMC. The USGS maintains WDPA Site Identifiers (WDPAID, WDPA_PID), a persistent identifier for each protected area, provided by UNEP-WCMC. Agency partners are encouraged to track WDPA Site Identifier values in source datasets to improve the efficiency and accuracy of PAD-US and WDPA updates. The IUCN protected areas in the U.S. are managed by thousands of agencies and organizations across the country and include over 42,900 designated sites such as National Parks, National Wildlife Refuges, National Monuments, Wilderness Areas, some State Parks, State Wildlife Management Areas, Local Nature Preserves, City Natural Areas, The Nature Conservancy and other Land Trust Preserves, and Conservation Easements. The boundaries of these protected places (some overlap) are represented as polygons in the PAD-US, along with informative descriptions such as Unit Name, Manager Name, and Designation Type. As the WDPA is a global dataset, their data standards (UNEP-WCMC 2019) require simplification to reduce the number of records included, focusing on the protected area site name and management authority as described in the Supplemental Information section in this metadata record. Given the numerous organizations involved, sites may be added or removed from the WDPA between PAD-US updates. These differences may reflect actual change in protected area status; however, they also reflect the dynamic nature of spatial data or Geographic Information Systems (GIS). Many agencies and non-governmental organizations are working to improve the accuracy of protected area boundaries, the consistency of attributes, and inventory completeness between PAD-US updates. In addition, USGS continually seeks partners to review and refine the assignment of conservation measures in the PAD-US.
d
An inventory of subsurface geologic data: structure contour and isopach...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). An inventory of subsurface geologic data: structure contour and isopach datasets, U.S. Geological Survey [Dataset]. https://catalog.data.gov/dataset/an-inventory-of-subsurface-geologic-data-structure-contour-and-isopach-datasets-u-s-geolog
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Under the direction and funding of the National Cooperative Mapping Program with guidance and encouragement from the United States Geological Survey (USGS), a digital database of three-dimensional (3D) vector data, displayed as two-dimensional (2D) data-extent bounding polygons. This geodatabase is to act as a virtual and digital inventory of 3D structure contour and isopach vector data for the USGS National Geologic Synthesis (NGS) team. This data will be available visually through a USGS web application and can be queried using complimentary nonspatial tables associated with each data harboring polygon. This initial publication contains 60 datasets collected directly from USGS specific publications and federal repositories. Further publications of dataset collections in versioned releases will be annotated in additional appendices, respectfully. These datasets can be identified from their specific version through their nonspatial tables. This digital dataset contains spatial extents of the 2D geologic vector data as polygon features that are attributed with unique identifiers that link the spatial data to nonspatial tables that define the data sources used and describe various aspects of each published model. The nonspatial DataSources table includes full citation and URL address for both published model reports, any digital model data released as a separate publication, and input type of vector data, using several classification schemes. A tabular glossary defines terms used in the dataset. A tabular data dictionary describes the entity and attribute information for all attributes of the geospatial data and the accompanying nonspatial tables.
f
Types of adventitious sounds and its characteristics.
plos.figshare.com
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Renard Xaviero Adhi Pramono; Stuart Bowyer; Esther Rodriguez-Villegas (2023). Types of adventitious sounds and its characteristics. [Dataset]. http://doi.org/10.1371/journal.pone.0177926.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0177926.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Renard Xaviero Adhi Pramono; Stuart Bowyer; Esther Rodriguez-Villegas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Types of adventitious sounds and its characteristics.
Datasets and codes for 'Elevating the importance of Risk of Bias assessment...
zenodo.org
bin, csv, txt
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antica Culina; Antica Culina; Alfredo Sánchez-Tójar; Alfredo Sánchez-Tójar; Oliver Pescott; Oliver Pescott; Rose O'Dea; Rose O'Dea; Matthew Grainger; Matthew Grainger (2025). Datasets and codes for 'Elevating the importance of Risk of Bias assessment for ecology and evolution' [Dataset]. http://doi.org/10.5281/zenodo.15630232
Explore at:
bin, csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15630232
Dataset updated
Jun 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antica Culina; Antica Culina; Alfredo Sánchez-Tójar; Alfredo Sánchez-Tójar; Oliver Pescott; Oliver Pescott; Rose O'Dea; Rose O'Dea; Matthew Grainger; Matthew Grainger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was collected as part of the manuscript entitled 'Elevating the importance of Risk of Bias assessment for ecology and evolution'

Methods

1) Survey on the RoB awareness and use

The survey was approved by the Ethics Committee of the Ruđer Bošković Institute, Zagreb, Croatia, ref. ZV/3218/1-2023. The survey was intended for ecologists and evolutionary biologists who have published at least one meta-analysis. It included non-identifying questions on familiarity with the concept of the Risk of Bias, awareness and use of RoB assessment, and also included general questions on familiarity with meta-analysis, field of research, and career stage.

The survey was created in Google Forms and sent on the 11th September 2023 to the emails of corresponding authors of meta-analyses in ecology and evolution, via mailing lists (NC3 Collaborative Research Centre; Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology; German Zoological Society; Sociedad Española de Etología y Ecología Evolutiv;, EuropeList@conbio.org), slack channels (Big Team Science Conference Slack Channel, German Reproducibility Network Event Slack Channel, ESMARConf Slack Channel), Twitter posts, and the SORTEE newsletter. The survey was open to responses until the 15th October 2023.

To determine the corresponding authors of meta-analyses, AST (author) searched for meta-analyses published in 300 journals via Web of Science (databases: SCI-EXPANDED, SSCI, AHCI, ESCI) on the 25th April 2023 (search string, which also provides codes for included journals, can be found in 'search_string.txt' ). This search retrieved 3,289 results (potential meta-analyses, BibTeX list of these available in this data package) published between 1945 and 2023. AST then extracted a list of all corresponding author email addresses from these articles using packages revtools 0.4.1 (Westgate 2019a, 2019 b) and stringr 1.5.0 (Wickham 2023) in R 4.2.3 (R Core Team, 2021). Code is available as '001_email_compilation.R'. This resulted in 3,346 email addresses; however, 789 emails (~24%) bounced back.

The survey received 232 valid responses (i.e., 9.1% response rate). Because some of the responses were prone to subjective judgment on what the answer exactly meant, four assessors (AC, AST, ROD, and MG; authors) went through the answers of 188 respondents who had answered ‘YES’ to the question ‘Prior to receiving this survey invite, were you familiar with the concept of Risk of Bias (RoB)?’ or who had answered ‘NO’ but their remaining answers indicated otherwise. Each assessor provided an answer ‘YES’, ‘NO’, ‘Unsure’, or ‘NA’ to the following six questions:

Has the respondent heard of RoB?

Does the respondent have a correct interpretation of RoB?

Does the respondent claim to have conducted a RoB assessment?

Have they truly conducted RoB assessment?

Respondent thinks RoB is publication bias

Respondent has conducted publication bias, rather than RoB assessment

We then compared the interpretations of the responses among all four assessors. When three or four assessors had the same interpretation, we chose the most common answer as the final one. When there was disagreement, we discussed the interpretations and agreed on whether the final answer should be ‘YES’, ‘NO’, or ‘NA’ (i.e., either the question is irrelevant given the previous answers, we could not agree on the interpretation, or agreed that the answer was too vague to interpret). For the analyses, we used this final data table containing both the original responses and each evaluator's scores ('Survey_scores_per_evaluator.csv') and the post hoc agreed scores ('Survey_final_scores.csv').

2) Journals and RoB assessment

Between 19th April 2025 and 12th May 2025, AC, OP, and ROD (authors) checked the websites of 275 journals that publish ecology and evolutionary biology research. The list of journals was taken from Ivimey-Cook et al. (2025). We checked each journal’s Aims & Scope, Author instructions, and Editorial policy sections to search for whether a journal accepts evidence synthesis, and whether it mentions Risk of Bias or any related concepts for authors of evidence synthesis articles.

We first piloted our data extraction on 10 journals to adjust the data extraction questions and align responses. We used Google Forms for data extraction. The main questions included:

Does the journal explicitly solicit some form of evidence synthesis?

Does the journal specifically mention RoB or related assessment of primary literature in guidelines to authors of evidence synthesis, or is RoB/related assessment specifically mentioned in linked other guidelines (e.g., journal states something like ‘follow PRISMA guidelines when reporting MA’ but no further detail on RoB or related is mentioned)?

What concept related to RoB (including RoB itself) does the journal or a linked guideline exactly mention?

If the journal links or refers to external guidelines that mention RoB or related assessment, what are these guidelines?

What specific RoB or related tool/checklist is mentioned in journal guidelines, or in linked guidelines?

What is the strength of the journal’s policy on the use of RoB or related assessment?

The following decisions were made in light of the pilot extraction: First, we followed this definition of evidence synthesis ‘‘Evidence syntheses are conducted in an unbiased, reproducible way to provide evidence for practice and policy-making, as well as to identify gaps in the research. Evidence syntheses may also include a meta-analysis, a more quantitative process of synthesising and visualising data retrieved from various studies’ (https://guides.library.cornell.edu/evidence-synthesis). Second, journals that explicitly solicit narrative reviews and similar (e.g. the Annual Review of Ecology, Evolution and Systematics solicits ‘essay reviews’ but not systematic or quantitative reviews) were scored as ‘NO’ for question (1) above, whereas journals that do not explicitly solicit evidence synthesis, but something more general (e.g. review articles, reviews and comprehensive synthesis, reviews) were scored as ‘Unsure’ for the same question.

We divided journals across reviewers, and one reviewer checked each journal. The data table containing reviewer initials and their scores is 'Journals_risk_of_bias'.
d
Compilation of North Atlantic albacore tuna (Thunnus alalunga) fork length...
dataone.org
doi.pangaea.de
+1more
Updated Jan 6, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lehodey, Patrick; Senina, Inna; Dragon, Anne-Cécile; Arrizabalaga, Haritz; Collecte Localisation Satellites (2018). Compilation of North Atlantic albacore tuna (Thunnus alalunga) fork length frequencies in 1 centimeter intervals from catches in the North Atlantic for the period 1956-2010 [Dataset]. http://doi.org/10.1594/PANGAEA.828171
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.828171
Dataset updated
Jan 6, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Lehodey, Patrick; Senina, Inna; Dragon, Anne-Cécile; Arrizabalaga, Haritz; Collecte Localisation Satellites
Time period covered
Nov 15, 1956 - Dec 15, 2011
Description
The development of the ecosystem approach and models for the management of ocean marine resources requires easy access to standard validated datasets of historical catch data for the main exploited species. They are used to measure the impact of biomass removal by fisheries and to evaluate the models skills, while the use of standard dataset facilitates models inter-comparison. North Atlantic albacore tuna is exploited all year round by longline and in summer and autumn by surface fisheries and fishery statistics compiled by the International Commission for the Conservation of Atlantic Tunas (ICCAT). Catch and effort with geographical coordinates at monthly spatial resolution of 1° or 5° squares were extracted for this species with a careful definition of fisheries and data screening. In total, thirteen fisheries were defined for the period 1956-2010, with fishing gears longline, troll, mid-water trawl and bait fishing. However, the spatialized catch effort data available in ICCAT database represent a fraction of the entire total catch. Length frequencies of catch were also extracted according to the definition of fisheries above for the period 1956-2010 with a quarterly temporal resolution and spatial resolutions varying from 1°x 1° to 10°x 20°. The resolution used to measure the fish also varies with size-bins of 1, 2 or 5 cm (Fork Length). The screening of data allowed detecting inconsistencies with a relatively large number of samples larger than 150 cm while all studies on the growth of albacore suggest that fish rarely grow up over 130 cm. Therefore, a threshold value of 130 cm has been arbitrarily fixed and all length frequency data above this value removed from the original data set.

Facebook

Twitter

Click to copy link

Link copied

Cite

Miranda Cumpston; Sue Brennan; Joanne McKenzie; Rebecca Ryan (2023). Synthesis methods data dictionary [Dataset]. http://doi.org/10.26180/20785948.v3

Synthesis methods data dictionary

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.26180/20785948.v3

Dataset updated

Jan 27, 2023

Dataset provided by

Monash University

Authors

Miranda Cumpston; Sue Brennan; Joanne McKenzie; Rebecca Ryan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This data dictionary describes the coding system applied to the data extracted from systematic reviews included in the paper:

Cumpston MS, Brennan SE, Ryan R, McKenzie JE. 2023. Statistical synthesis methods other than meta-analysis are commonly used but seldom specified: survey of systematic reviews of interventions

Associated files: 1. Synthesis methods data file: Cumpston_et_al_2023_other_synthesis_methods.xlsx (https://doi.org/10.26180/20785396) 2. Synthesis methods Stata code: Cumpston_et_al_2023_other_synthesis_methods.do (https://doi.org/10.26180/20786251) 3. Study protocol: Cumpston MS, McKenzie JE, Thomas J and Brennan SE. The use of ‘PICO for synthesis’ and methods for synthesis without meta-analysis: protocol for a survey of current practice in systematic reviews of health interventions. F1000Research 2021, 9:678. (https://doi.org/10.12688/f1000research.24469.2)

Clear search

Close search

Google apps

Main menu

Synthesis methods data dictionary

Synthesis methods data file:...

CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated...

Data from: Global dataset of nitrogen fixation rates across inland and...

CocoChorales Dataset

SynthRAD2023 Grand Challenge dataset: synthetizing computed tomography for...

Preliminary digital data for a 3-layer geologic model of the conterminous...

Conserved Areas Explorer

Data from: Disentangling the relative contributions of factors determining...

Description of the Data and file structure

Data from: De Novo Synthesis of the DEF-Ring Stereotriad Core of the...

What is meant by “multimodal therapy” for aphasia? (Pierce et al., 2019)

Dissolved inorganic carbon, alkalinity, pH, temperature, salinity, and other...

Spatially explicit estimates of stock size, structure and biomass of North...

Artificial Personality

Protected Areas Database of the United States (PAD-US) 2.1 - World Database...

An inventory of subsurface geologic data: structure contour and isopach...

Types of adventitious sounds and its characteristics.

Datasets and codes for 'Elevating the importance of Risk of Bias assessment...

Compilation of North Atlantic albacore tuna (Thunnus alalunga) fork length...

Synthesis methods data dictionary