15 datasets found

Naturalistic Neuroimaging Database
openneuro.org
Updated Apr 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v1.1.3
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds002837.v1.1.3
Dataset updated
Apr 20, 2021
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overview

The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.

The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

v2.0 Changes

Overview

We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.

Normalization

Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:

# Generate a resting state (rs) timeseries (ts) # Install / load package to make fake fMRI ts # install.packages("neuRosim") library(neuRosim) # Generate a ts ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1) # 3dDetrend -normalize # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1" # Do for the full timeseries ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2)); # Do this again for a shorter version of the same timeseries ts.shorter.length <- length(ts.normalised.long)/4 ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2)); # By looking at the summaries, it can be seen that the median values become larger summary(ts.normalised.long) summary(ts.normalised.short) # Plot results for the long and short ts # Truncate the longer ts for plotting only ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length] # Give the plot a title title <- "3dDetrend -normalize for long (blue) and short (red) timeseries"; plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short))); # Add zero line lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey'); # 3dDetrend -normalize -polort 0 for long timeseries lines(ts.normalised.long.made.shorter, col='blue'); # 3dDetrend -normalize -polort 0 for short timeseries lines(ts.normalised.short, col='red');

Standardization/modernization

The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.

New afni_proc.py command line

The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

Effect on results

From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an
d
WLCI - Important Agricultural Lands Assessment (Input Raster: Normalized...
catalog.data.gov
data.usgs.gov
+2more
Updated Sep 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). WLCI - Important Agricultural Lands Assessment (Input Raster: Normalized Antelope Damage Claims) [Dataset]. https://catalog.data.gov/dataset/wlci-important-agricultural-lands-assessment-input-raster-normalized-antelope-damage-claim
Explore at:
Dataset updated
Sep 24, 2025
Dataset provided by
U.S. Geological Survey
Description
The values in this raster are unit-less scores ranging from 0 to 1 that represent normalized dollars per acre damage claims from antelope on Wyoming lands. This raster is one of 9 inputs used to calculate the "Normalized Importance Index."
d
Residential Existing Homes (One to Four Units) Energy Efficiency Meter...
catalog.data.gov
datasets.ai
+2more
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ny.gov (2025). Residential Existing Homes (One to Four Units) Energy Efficiency Meter Evaluated Project Data: 2007 – 2012 [Dataset]. https://catalog.data.gov/dataset/residential-existing-homes-one-to-four-units-energy-efficiency-meter-evaluated-projec-2007
Explore at:
Dataset updated
Jul 26, 2025
Dataset provided by
data.ny.gov
Description
IMPORTANT! PLEASE READ DISCLAIMER BEFORE USING DATA. This dataset backcasts estimated modeled savings for a subset of 2007-2012 completed projects in the Home Performance with ENERGY STAR® Program against normalized savings calculated by an open source energy efficiency meter available at https://www.openee.io/. Open source code uses utility-grade metered consumption to weather-normalize the pre- and post-consumption data using standard methods with no discretionary independent variables. The open source energy efficiency meter allows private companies, utilities, and regulators to calculate energy savings from energy efficiency retrofits with increased confidence and replicability of results. This dataset is intended to lay a foundation for future innovation and deployment of the open source energy efficiency meter across the residential energy sector, and to help inform stakeholders interested in pay for performance programs, where providers are paid for realizing measurable weather-normalized results. To download the open source code, please visit the website at https://github.com/openeemeter/eemeter/releases D I S C L A I M E R: Normalized Savings using open source OEE meter. Several data elements, including, Evaluated Annual Elecric Savings (kWh), Evaluated Annual Gas Savings (MMBtu), Pre-retrofit Baseline Electric (kWh), Pre-retrofit Baseline Gas (MMBtu), Post-retrofit Usage Electric (kWh), and Post-retrofit Usage Gas (MMBtu) are direct outputs from the open source OEE meter. Home Performance with ENERGY STAR® Estimated Savings. Several data elements, including, Estimated Annual kWh Savings, Estimated Annual MMBtu Savings, and Estimated First Year Energy Savings represent contractor-reported savings derived from energy modeling software calculations and not actual realized energy savings. The accuracy of the Estimated Annual kWh Savings and Estimated Annual MMBtu Savings for projects has been evaluated by an independent third party. The results of the Home Performance with ENERGY STAR impact analysis indicate that, on average, actual savings amount to 35 percent of the Estimated Annual kWh Savings and 65 percent of the Estimated Annual MMBtu Savings. For more information, please refer to the Evaluation Report published on NYSERDA’s website at: http://www.nyserda.ny.gov/-/media/Files/Publications/PPSER/Program-Evaluation/2012ContractorReports/2012-HPwES-Impact-Report-with-Appendices.pdf. This dataset includes the following data points for a subset of projects completed in 2007-2012: Contractor ID, Project County, Project City, Project ZIP, Climate Zone, Weather Station, Weather Station-Normalization, Project Completion Date, Customer Type, Size of Home, Volume of Home, Number of Units, Year Home Built, Total Project Cost, Contractor Incentive, Total Incentives, Amount Financed through Program, Estimated Annual kWh Savings, Estimated Annual MMBtu Savings, Estimated First Year Energy Savings, Evaluated Annual Electric Savings (kWh), Evaluated Annual Gas Savings (MMBtu), Pre-retrofit Baseline Electric (kWh), Pre-retrofit Baseline Gas (MMBtu), Post-retrofit Usage Electric (kWh), Post-retrofit Usage Gas (MMBtu), Central Hudson, Consolidated Edison, LIPA, National Grid, National Fuel Gas, New York State Electric and Gas, Orange and Rockland, Rochester Gas and Electric. How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.
Z
Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning...
data.niaid.nih.gov
Updated Jun 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ambra Demontis (2022). ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6568777
Explore at:
Dataset updated
Jun 30, 2022
Dataset provided by
Luca Demetrio
Maura Pintor
Fabio Roli
Angelo Sotgiu
Ambra Demontis
Daniele Angioni
Battista Biggio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.

We release our dataset as a set of folders indicating the patch target label (e.g., banana), each containing 1000 subfolders as the ImageNet output classes.

An example showing how to use the dataset is shown below.

code for testing robustness of a model

import os.path

from torchvision import datasets, transforms, models import torch.utils.data

class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """

def find_classes(self, directory): classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir()) if not classes: raise FileNotFoundError(f"Couldn't find any class folder in {directory}.") class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if len(os.listdir(os.path.join(directory, cls_name))) > 0} return classes, class_to_idx

extract and unzip the dataset, then write top folder here

dataset_folder = 'data/ImageNet-Patch'

available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }

select folder with specific target

target_label = 954

dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])

dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()

batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]

accuracy = correct / total attack_sr = attack_success / total

print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)
f
Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq...
figshare.com
datasetcatalog.nlm.nih.gov
+1more
xlsx
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.XLSX [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2020.00594.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.
Data from: Variation in trends of consumption based carbon accounts
data.europa.eu
unknown
Updated May 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2019). Variation in trends of consumption based carbon accounts [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-3187310?locale=da
Explore at:
unknown(103992)Available download formats
Dataset updated
May 23, 2019
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this work we present results of all the major global models and normalise the model results by looking at changes over time relative to a common base year value. We give an analysis of the variability across the models, both before and after normalisation in order to give insights into variance at national and regional level. A dataset of harmonised results (based on means) and measures of dispersion is presented, providing a baseline dataset for CBCA validation and analysis. The dataset is intended as a goto dataset for country and regional results of consumption and production based accounts. The normalised mean for each country/region is the principle result that can be used to assess the magnitude and trend in the emission accounts. However, an additional key element of the dataset are the measures of robustness and spread of the results across the source models. These metrics give insight into the amount of trust should be placed in the individual country/region results. Code at https://doi.org/10.5281/zenodo.3181930
d
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Normalized Atmospheric Deposition for 2002, Total Inorganic Nitrogen [Dataset]. https://catalog.data.gov/dataset/attributes-for-nhdplus-catchments-version-1-1-for-the-conterminous-united-states-normalize
Explore at:
Dataset updated
Oct 8, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Contiguous United States, United States
Description
This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Total Inorganic Nitrogen for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of Total Inorganic Nitrogen deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.
CYGNSS Level 1 Science Data Record Version 2.1 - Dataset - NASA Open Data...
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). CYGNSS Level 1 Science Data Record Version 2.1 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/cygnss-level-1-science-data-record-version-2-1-c4d25
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This Level 1 (L1) dataset contains the Version 2.1 geo-located Delay Doppler Maps (DDMs) calibrated into Power Received (Watts) and Bistatic Radar Cross Section (BRCS) expressed in units of meters squared from the Delay Doppler Mapping Instrument aboard the CYGNSS satellite constellation. This version supersedes Version 2.0. Other useful scientific and engineering measurement parameters include the DDM of Normalized Bistatic Radar Cross Section (NBRCS), the Delay Doppler Map Average (DDMA) of the NBRCS near the specular reflection point, and the Leading Edge Slope (LES) of the integrated delay waveform. The L1 dataset contains a number of other engineering and science measurement parameters, including sets of quality flags/indicators, error estimates, and bias estimates as well as a variety of orbital, spacecraft/sensor health, timekeeping, and geolocation parameters. At most, 8 netCDF data files (each file corresponding to a unique spacecraft in the CYGNSS constellation) are provided each day; under nominal conditions, there are typically 6-8 spacecraft retrieving data each day, but this can be maximized to 8 spacecraft under special circumstances in which higher than normal retrieval frequency is needed (i.e., during tropical storms and or hurricanes). Latency is approximately 6 days (or better) from the last recorded measurement time. The Version 2.1 release represents the second science-quality release. Here is a summary of improvements that reflect the quality of the Version 2.1 data release: 1) data is now available when the CYGNSS satellites are rolled away from nadir during orbital high beta-angle periods, resulting in a significant amount of additional data; 2) correction to coordinate frames result in more accurate estimates of receiver antenna gain at the specular point; 3) improved calibration for analog-to-digital conversion results in better consistency between CYGNSS satellites measurements at nearly the same location and time; 4) improved GPS EIRP and transmit antenna pattern calibration results in significantly reduced PRN-dependence in the observables; 5) improved estimation of the location of the specular point within the DDM; 6) an altitude-dependent scattering area is used to normalize the scattering cross section (v2.0 used a simpler scattering area model that varied with incidence and azimuth angles but not altitude); 7) corrections added for noise floor-dependent biases in scattering cross section and leading edge slope of delay waveform observed in the v2.0 data. Users should also note that the receiver antenna pattern calibration is not applied per-DDM-bin in this v2.1 release.
d
Data from: Attributes for NHDPlus Catchments (Version 1.1) for the...
catalog.data.gov
data.usgs.gov
+3more
Updated Sep 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Normalized Atmospheric Deposition for 2002, Ammonium (NH4) [Dataset]. https://catalog.data.gov/dataset/attributes-for-nhdplus-catchments-version-1-1-for-the-conterminous-united-states-normalize-dafbc
Explore at:
Dataset updated
Sep 17, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Contiguous United States, United States
Description
This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Ammonium (NH4) for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of NH4 deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.
g
Pittsburgh Regional Transit Scheduled Trip Counts
gimi9.com
data.wprdc.org
+1more
Updated Dec 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Pittsburgh Regional Transit Scheduled Trip Counts [Dataset]. https://gimi9.com/dataset/data-gov_port-authority-scheduled-trip-counts/
Explore at:
Dataset updated
Dec 10, 2019
Description
This dataset lists Pittsburgh Regional Transit scheduled bus and rail trip counts and distances since November 2016. For convenience, this data is published in several formats: - Daily Scheduled Trips: Aggregated total trip count and trip distances for each route per day. - Monthly Scheduled Trips: Aggregated total trip count and trip distances for each route per month and day type (weekday, Saturday, and Sunday). This can be joined with the Monthly Ridership dataset on DateKey, Route, and DayType to normalize the average riders per trip. - Detailed Daily Scheduled Trips: This dataset lists the detailed daily schedule for each route, including the start and end times for each trip. This can be used to see the earliest and latest runs and study peak- and off-peak trip frequencies. Port Authority makes quarterly schedule adjustments that include adding and removing trips on certain routes to better serve riders. These schedule adjustments are called "picks" and correspond to the year and month of the schedule change, e.g. the November 2016 pick is known as 1611.
d
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...
catalog.data.gov
data.usgs.gov
+3more
Updated Sep 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Normalized Atmospheric Deposition for 2002, Nitrate (NO3) [Dataset]. https://catalog.data.gov/dataset/attributes-for-nhdplus-catchments-version-1-1-for-the-conterminous-united-states-normalize-781ec
Explore at:
Dataset updated
Sep 16, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Contiguous United States, United States
Description
This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Nitrate (NO3) for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of NO3 deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.
CYGNSS Level 1 Science Data Record Version 2.1
catalog.data.gov
s.cnmilf.com
+4more
Updated Sep 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA/JPL/PODAAC;NASA/ESSP/UMICH/CYGNSS (2025). CYGNSS Level 1 Science Data Record Version 2.1 [Dataset]. https://catalog.data.gov/dataset/cygnss-level-1-science-data-record-version-2-1-c5c5e
Explore at:
Dataset updated
Sep 19, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This Level 1 (L1) dataset contains the Version 2.1 geo-located Delay Doppler Maps (DDMs) calibrated into Power Received (Watts) and Bistatic Radar Cross Section (BRCS) expressed in units of meters squared from the Delay Doppler Mapping Instrument aboard the CYGNSS satellite constellation. This version supersedes Version 2.0. Other useful scientific and engineering measurement parameters include the DDM of Normalized Bistatic Radar Cross Section (NBRCS), the Delay Doppler Map Average (DDMA) of the NBRCS near the specular reflection point, and the Leading Edge Slope (LES) of the integrated delay waveform. The L1 dataset contains a number of other engineering and science measurement parameters, including sets of quality flags/indicators, error estimates, and bias estimates as well as a variety of orbital, spacecraft/sensor health, timekeeping, and geolocation parameters. At most, 8 netCDF data files (each file corresponding to a unique spacecraft in the CYGNSS constellation) are provided each day; under nominal conditions, there are typically 6-8 spacecraft retrieving data each day, but this can be maximized to 8 spacecraft under special circumstances in which higher than normal retrieval frequency is needed (i.e., during tropical storms and or hurricanes). Latency is approximately 6 days (or better) from the last recorded measurement time. The Version 2.1 release represents the second science-quality release. Here is a summary of improvements that reflect the quality of the Version 2.1 data release: 1) data is now available when the CYGNSS satellites are rolled away from nadir during orbital high beta-angle periods, resulting in a significant amount of additional data; 2) correction to coordinate frames result in more accurate estimates of receiver antenna gain at the specular point; 3) improved calibration for analog-to-digital conversion results in better consistency between CYGNSS satellites measurements at nearly the same location and time; 4) improved GPS EIRP and transmit antenna pattern calibration results in significantly reduced PRN-dependence in the observables; 5) improved estimation of the location of the specular point within the DDM; 6) an altitude-dependent scattering area is used to normalize the scattering cross section (v2.0 used a simpler scattering area model that varied with incidence and azimuth angles but not altitude); 7) corrections added for noise floor-dependent biases in scattering cross section and leading edge slope of delay waveform observed in the v2.0 data. Users should also note that the receiver antenna pattern calibration is not applied per-DDM-bin in this v2.1 release.
Data for A Systemic Framework for Assessing the Risk of Decarbonization to...
zenodo.org
txt
Updated Sep 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soheil Shayegh; Soheil Shayegh; Giorgia Coppola; Giorgia Coppola (2025). Data for A Systemic Framework for Assessing the Risk of Decarbonization to Regional Manufacturing Activities in the European Union [Dataset]. http://doi.org/10.5281/zenodo.17152310
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17152310
Dataset updated
Sep 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Soheil Shayegh; Soheil Shayegh; Giorgia Coppola; Giorgia Coppola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 18, 2025
Area covered
European Union
Description
README — Code and data
Project: LOCALISED

Work Package 7, Task 7.1

Paper: A Systemic Framework for Assessing the Risk of Decarbonization to Regional Manufacturing Activities in the European Union

What this repo does
-------------------
Builds the Transition‑Risk Index (TRI) for EU manufacturing at NUTS‑2 × NACE Rev.2, and reproduces the article’s Figures 3–6:
• Exposure (emissions by region/sector)
• Vulnerability (composite index)
• Risk = Exposure ⊗ Vulnerability
Outputs include intermediate tables, the final analysis dataset, and publication figures.

Folder of interest
------------------
Code and data/
├─ Code/ # R scripts (run in order 1A → 5)
│ └─ Create Initial Data/ # scripts to (re)build Initial data/ from Eurostat API with imputation
├─ Initial data/ # Eurostat inputs imputed for missing values
├─ Derived data/ # intermediates
├─ Final data/ # final analysis-ready tables
└─ Figures/ # exported figures

Quick start
-----------
1) Open R (or RStudio) and set the working directory to “Code and data/Code”.
Example: setwd(".../Code and data/Code")
2) Initial data/ contains the required Eurostat inputs referenced by the scripts.
To reproduce the inputs in Initial data/, run the scripts in Code/Create Initial Data/.
These scripts download the required datasets from the respective API and impute missing values; outputs are written to ../Initial data/.
3) Run scripts sequentially (they use relative paths to ../Raw data, ../Derived data, etc.):
1A-non-sector-data.R → 1B-sector-data.R → 1C-all-data.R → 2-reshape-data.R → 3-normalize-data-by-n-enterpr.R → 4-risk-aggregation.R → 5A-results-maps.R, 5B-results-radar.R

What each script does
---------------------
Create Initial Data — Recreate inputs
• Download source tables from the Eurostat API or the Localised DSP, apply light cleaning, and impute missing values.
• Write the resulting inputs to Initial data/ for the analysis pipeline.

1A / 1B / 1C — Build the unified base
• Read individual Eurostat datasets (some sectoral, some only regional).
• Harmonize, aggregate, and align them into a single analysis-ready schema.
• Write aggregated outputs to Derived data/ (and/or Final data/ as needed).

2 — Reshape and enrich
• Reshapes the combined data and adds metadata.
• Output: Derived data/2_All_data_long_READY.xlsx (all raw indicators in tidy long format, with indicator names and values).

3 — Normalize (enterprises & min–max)
• Divide selected indicators by number of enterprises.
• Apply min–max normalization to [0.01, 0.99].
• Exposure keeps real zeros (zeros remain zero).
• Write normalized tables to Derived data/ or Final data/.

4 — Aggregate indices
• Vulnerability: build dimension scores (Energy, Labour, Finance, Supply Chain, Technology).
– Within each dimension: equal‑weight mean of directionally aligned, [0.01,0.99]‑scaled indicators.
– Dimension scores are re‑scaled to [0.01,0.99].
• Aggregate Vulnerability: equal‑weight mean of the five dimensions.
• TRI (Risk): combine Exposure (E) and Vulnerability (V) via a weighted geometric rule with α = 0.5 in the baseline.
– Policy‑intuitive properties: high E & high V → high risk; imbalances penalized (non‑compensatory).
• Output: Final data/ (main analysis tables).

5A / 5B — Visualize results
• 5A: maps and distribution plots for Exposure, Vulnerability, and Risk → Figures 3 & 4.
• 5B: comparative/radar profiles for selected countries/regions/subsectors → Figures 5 & 6.
• Outputs saved to Figures/.

Data flow (at a glance)
-----------------------
Initial data → (1A–1C) Aggregated base → (2) Tidy long file → (3) Normalized indicators → (4) Composite indices → (5) Figures
| | |
v v v
Derived data/ 2_All_data_long_READY.xlsx Final data/ & Figures/

Assumptions & conventions
-------------------------
• Geography: EU NUTS‑2 regions; Sector: NACE Rev.2 manufacturing subsectors.
• Equal weights by default where no evidence supports alternatives.
• All indicators directionally aligned so that higher = greater transition difficulty.
• Relative paths assume working directory = Code/.

Reproducing the article
-----------------------
• Optionally run the codes from the Code/Create Initial Data subfolder
• Run 1A → 5B without interruption to regenerate:
– Figure 3: Exposure, Vulnerability, Risk maps (total manufacturing).
– Figure 4: Vulnerability dimensions (Energy, Labour, Finance, Supply Chain, Technology).
– Figure 5: Drivers of risk—highest vs. lowest risk regions (example: Germany & Greece).
– Figure 6: Subsector case (e.g., basic metals) by selected regions.
• Final tables for the paper live in Final data/. Figures export to Figures/.

Requirements
------------
• R (version per your environment).
• Install any missing packages listed at the top of each script (e.g., install.packages("...")).

Troubleshooting
---------------
• “File not found”: check that the previous script finished and wrote its outputs to the expected folder.
• Paths: confirm getwd() ends with /Code so relative paths resolve to ../Raw data, ../Derived data, etc.
• Reruns: optionally clear Derived data/, Final data/, and Figures/ before a clean rebuild.

Provenance & citation
---------------------
• Inputs: Eurostat and related sources cited in the paper and headers of the scripts.
• Methods: OECD composite‑indicator guidance; IPCC AR6 risk framing (see paper references).
• If you use this code, please cite the article:
A Systemic Framework for Assessing the Risk of Decarbonization to Regional Manufacturing Activities in the European Union.
CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thilde Terkelsen; Anders Krogh; Elena Papaleo (2023). CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for the analysis of quantitative biological data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007665
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007665
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Thilde Terkelsen; Anders Krogh; Elena Papaleo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.
f
snRNA-seq, Primary-Recurrent GBM (Mikolajewicz Cohort)
figshare.com
bin
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Mikolajewicz (2024). snRNA-seq, Primary-Recurrent GBM (Mikolajewicz Cohort) [Dataset]. http://doi.org/10.6084/m9.figshare.25917628.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25917628.v1
Dataset updated
Jun 4, 2024
Dataset provided by
figshare
Authors
Nicholas Mikolajewicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary.10 primary GBM and 8 recurrent GBM samples (14/18 matched) profiled using single nucleus RNA- sequencing (sci-RNA-seq3 protocol).Data Format.Data is provided as preprocessed dataset, stored in Seurat Object.Sample processing, sci-RNA-seq3 library generation, and sequencingSnap-frozen patient pGBM and rGBM tissues were chopped with a razor blade or scissors before nucleus isolation. Nuclei extraction and fixation were performed as previously described (Cao 2019), except for the use of a modified CST lysis buffer50 plus 1% of SUPERase-In RNase Inhibitor (Invitrogen, #AM2696). Lysis time and washing steps were further optimized based on human GBM tissue. Nuclei quality was checked with DAPI and Wheat Germ Agglutinin (WGA) staining. Sci-RNA-seq3 libraries were generated as previously described49 using three-level combinatorial indexing. The final libraries were sequenced on Illumina NovaSeq as follows: read 1: 34bp, read 2: >=69bp, index 1: 10bp, index 2: 10bp.Demultiplexing and read alignments.Raw sequencing reads were first demultiplexed based on i5/i7 PCR barcodes. FASTQ files were then processed using the sci-RNA-Seq3 pipeline. After barcodes and unique molecular identifiers (UMIs) were extracted from the read1 of FASTQ files, read alignment was performed using STAR short-read aligner (v2.5.2b) with the human genome (hg19) and Gencode v24 gene annotations. After removing duplicate reads based on UMI, barcode, chromosome and alignment position, reads were summarized into a count matrix of M genes × N nuclei.Filtering, normalization, integration, and dimensional reduction.Raw count matrices were loaded into a Seurat object (version 4.0.1) and filtered to retain cells with (i) 200 – 9000 recovered genes per cell, (ii) less than 60% mitochondrial content, and (iii) unmatched rate within 3 median absolute deviations of the median. To normalize count matrix, we adopted the modeling framework previously described and implemented in sctransform (R Package, version 0.3.2). In brief, count data were modelled by regularized negative binomial regression, using sequencing depth as a model covariate to regress out the influence of technical effects, and Pearson residuals were used as the normalized and variance stabilized biological signal for downstream analysis. Data from each patient were integrated with the reciprocal PCA method (Seurat) using the top 2000 variable features. PCA was performed on the integrated dataset, and the top N components that accounted for 90% of the observed variance were used for UMAP embedding, RunUMAP(max_components = 2, n_neighbours = 50, min_dist = 01, metric = cosine).Contact.Contact Dr. Nicholas Mikolajewicz regarding any questions about the data or analysis (n.mikolajewicz@utoronto.ca)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v1.1.3

Naturalistic Neuroimaging Database

Explore at:

Unique identifier

https://doi.org/10.18112/openneuro.ds002837.v1.1.3

Dataset updated

Apr 20, 2021

Dataset provided by

OpenNeurohttps://openneuro.org/

Authors

Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Overview

The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.
The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

v2.0 Changes

Overview
- We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.

Normalization

Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:

# Generate a resting state (rs) timeseries (ts)
# Install / load package to make fake fMRI ts
# install.packages("neuRosim")
library(neuRosim)
# Generate a ts
ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1)
# 3dDetrend -normalize
# R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1"
# Do for the full timeseries
ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2));
# Do this again for a shorter version of the same timeseries
ts.shorter.length <- length(ts.normalised.long)/4
ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2));
# By looking at the summaries, it can be seen that the median values become  larger
summary(ts.normalised.long)
summary(ts.normalised.short)
# Plot results for the long and short ts
# Truncate the longer ts for plotting only
ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length]
# Give the plot a title
title <- "3dDetrend -normalize for long (blue) and short (red) timeseries";
plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short)));
# Add zero line
lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey');
# 3dDetrend -normalize -polort 0 for long timeseries
lines(ts.normalised.long.made.shorter, col='blue');
# 3dDetrend -normalize -polort 0 for short timeseries
lines(ts.normalised.short, col='red');

Standardization/modernization
- The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.
New afni_proc.py command line
- The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).
We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.
Effect on results
- From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an

Clear search

Close search

Google apps

Main menu

Naturalistic Neuroimaging Database

Overview

v2.0 Changes

WLCI - Important Agricultural Lands Assessment (Input Raster: Normalized...

Residential Existing Homes (One to Four Units) Energy Efficiency Meter...

Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning...

code for testing robustness of a model

extract and unzip the dataset, then write top folder here

select folder with specific target

Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq...

Data from: Variation in trends of consumption based carbon accounts

Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...

CYGNSS Level 1 Science Data Record Version 2.1 - Dataset - NASA Open Data...

Data from: Attributes for NHDPlus Catchments (Version 1.1) for the...

Pittsburgh Regional Transit Scheduled Trip Counts

Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...

CYGNSS Level 1 Science Data Record Version 2.1

Data for A Systemic Framework for Assessing the Risk of Decarbonization to...

CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...

snRNA-seq, Primary-Recurrent GBM (Mikolajewicz Cohort)

Naturalistic Neuroimaging Database

Overview

v2.0 Changes