Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset published by Luecken et al. 2021 which contains data from human bone marrow measured through joint profiling of single-nucleus RNA and Antibody-Derived Tags (ADTs) using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0.File Descriptioncite_quality_control.h5mu: Filtered cell by feature MuData object after quality control.cite_normalization.h5mu: MuData object of normalized data using DSB (denoised and scaled by background) normalization.cite_doublet_removal_xdbt.h5mu: MuData of data after doublet removal based on known cell type markers. Cells were removed if they were double positive for mutually exclusive markers with a DSB value >2.5.cite_dimensionality_reduction.h5mu: MuData of data after dimensionality reduction.cite_batch_correction.h5mu: MuData of data after batch correction.CitationLuecken, M. D. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2021).Original data linkhttps://openproblems.bio/neurips_docs/data/dataset/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Surface reflectance is a critical physical variable that affects the energy budget in land-atmosphere interactions, feature recognition and classification, and climate change research. This dataset uses the relative radiometric normalization method, and takes the Landsat-8 Operational Land Imager (OLI) surface reflectance products as the reference image to normalize the GF-1 satellite WFV sensor cloud-free images of Shandong Province in 2018. Relative radiometric normalization processing mainly includes atmospheric correction, image resampling, image registration, mask, extract the no-change pixels and calculate normalization coefficients. After relative radiometric normalization, the no-change pixels of each GF-1 WFV image and its reference image, R2 is 0.7295 above, RMSE is below 0.0172. The surface reflectance accuracy of GF-1 WFV image is improved, which can be used in cooperation with Landsat data to provide data support for remote sensing quantitative inversion. This dataset is in GeoTIFF format, and the spatial resolution of the image is 16 m.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Enriched GO terms for deconvolution. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to deconvolution. The identifier and name of each term is shown along with the total number of genes associated with the term, the number of associated genes that are also DE, the expected number under the null hypothesis, and the Fisher p value. (13 KB PDF)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Listeners utilize the immediate contexts to efficiently normalize variable vocal streams into standard phonology units. However, researchers debated whether non-speech contexts can also serve as valid clues for speech normalization. Supporters of the two sides proposed a general-auditory hypothesis and a speech-specific hypothesis to explain the underlying mechanisms. A possible confounding factor of this inconsistency is the listeners’ perceptual familiarity of the contexts, as the non-speech contexts were perceptually unfamiliar to listeners. In this study, we examined this confounding factor by recruiting a group of native Cantonese speakers with sufficient musical training experience and a control group with minimal musical training. Participants performed lexical tone judgment tasks in three contextual conditions, i.e., speech, non-speech, and music context conditions. Both groups were familiar with the speech context and not familiar with the non-speech context. The musician group was more familiar with the music context than the non-musician group. The results evidenced the lexical tone normalization process in speech context but not non-speech nor music contexts. More importantly, musicians did not outperform non-musicians on any contextual conditions even if the musicians were experienced at pitch perception, indicating that there is no noticeable transfer in pitch perception from the music domain to the linguistic domain for tonal language speakers. The findings showed that even high familiarity with a non-linguistic context cannot elicit an effective lexical tone normalization process, supporting the speech-specific basis of the perceptual normalization process.
IMPORTANT! PLEASE READ DISCLAIMER BEFORE USING DATA. This dataset backcasts estimated modeled savings for a subset of 2007-2012 completed projects in the Home Performance with ENERGY STAR® Program against normalized savings calculated by an open source energy efficiency meter available at https://www.openee.io/. Open source code uses utility-grade metered consumption to weather-normalize the pre- and post-consumption data using standard methods with no discretionary independent variables. The open source energy efficiency meter allows private companies, utilities, and regulators to calculate energy savings from energy efficiency retrofits with increased confidence and replicability of results. This dataset is intended to lay a foundation for future innovation and deployment of the open source energy efficiency meter across the residential energy sector, and to help inform stakeholders interested in pay for performance programs, where providers are paid for realizing measurable weather-normalized results. To download the open source code, please visit the website at https://github.com/openeemeter/eemeter/releases D I S C L A I M E R: Normalized Savings using open source OEE meter. Several data elements, including, Evaluated Annual Elecric Savings (kWh), Evaluated Annual Gas Savings (MMBtu), Pre-retrofit Baseline Electric (kWh), Pre-retrofit Baseline Gas (MMBtu), Post-retrofit Usage Electric (kWh), and Post-retrofit Usage Gas (MMBtu) are direct outputs from the open source OEE meter. Home Performance with ENERGY STAR® Estimated Savings. Several data elements, including, Estimated Annual kWh Savings, Estimated Annual MMBtu Savings, and Estimated First Year Energy Savings represent contractor-reported savings derived from energy modeling software calculations and not actual realized energy savings. The accuracy of the Estimated Annual kWh Savings and Estimated Annual MMBtu Savings for projects has been evaluated by an independent third party. The results of the Home Performance with ENERGY STAR impact analysis indicate that, on average, actual savings amount to 35 percent of the Estimated Annual kWh Savings and 65 percent of the Estimated Annual MMBtu Savings. For more information, please refer to the Evaluation Report published on NYSERDA’s website at: http://www.nyserda.ny.gov/-/media/Files/Publications/PPSER/Program-Evaluation/2012ContractorReports/2012-HPwES-Impact-Report-with-Appendices.pdf. This dataset includes the following data points for a subset of projects completed in 2007-2012: Contractor ID, Project County, Project City, Project ZIP, Climate Zone, Weather Station, Weather Station-Normalization, Project Completion Date, Customer Type, Size of Home, Volume of Home, Number of Units, Year Home Built, Total Project Cost, Contractor Incentive, Total Incentives, Amount Financed through Program, Estimated Annual kWh Savings, Estimated Annual MMBtu Savings, Estimated First Year Energy Savings, Evaluated Annual Electric Savings (kWh), Evaluated Annual Gas Savings (MMBtu), Pre-retrofit Baseline Electric (kWh), Pre-retrofit Baseline Gas (MMBtu), Post-retrofit Usage Electric (kWh), Post-retrofit Usage Gas (MMBtu), Central Hudson, Consolidated Edison, LIPA, National Grid, National Fuel Gas, New York State Electric and Gas, Orange and Rockland, Rochester Gas and Electric. How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.
This training dataset was calculated using the mechanistic modeling approach. See “Big data training data for artificial intelligence-based Li-ion diagnosis and prognosis“ (Journal of Power Sources, Volume 479, 15 December 2020, 228806) and "Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Diagnosis and Prognosis" (Energies, under review) for more details
The V vs. Q dataset was compiled with a resolution of 0.01 for the triplets and C/25 charges. This accounts for more than 5,000 different paths. Each path was simulated with at most 0.85% increases for each The training dataset, therefore, contains more than 700,000 unique voltage vs. capacity curves.
4 Variables are included, see read me file for details and example how to use. Cell info: Contains information on the setup of the mechanistic model Qnorm: normalize capacity scale for all voltage curves pathinfo: index for simulated conditions for all voltage curves volt: voltage data. Each column corresponds to the voltage simulated under the conditions of the corresponding line in pathinfo.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.
This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Nitrate (NO3) for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of NO3 deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.
This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Total Inorganic Nitrogen for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of Total Inorganic Nitrogen deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations.
The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States.
The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.
The UFPR-Periocular dataset has 16,830 images of both eyes (33,660 cropped images of each eye) from 1,122 subjects (2,244 classes).
All the images were captured by the participant using their own smartphone through a mobile application (app) developed by the authors. There are 15 samples from each subject's eye, obtained in 3 sessions (5 images per session) with a minimum interval of 8 hours between the sessions.
The images were collected from June 2019 to January 2020 and have several resolutions varying from 360×160 to 1862×1008 pixels – depending on the mobile device used to capture the image. In total, the dataset has images captured from 196 different mobile devices.
Each subject captured their images using the same device model. This dataset's main intra- and inter-class variability are caused by lighting variation, occlusion, specular reflection, blur, motion blur, eyeglasses, off-angle, eye-gaze, makeup, and facial expression.
The authors manually annotated the eye corner of all images with 4 points (inside and outside eye corners) and used it to normalize the periocular region regarding scale and rotation. All the original and cropped periocular images, eye-corner annotations, and experimental protocol files are publicly available for the research community (upon request).
The paper contains information about images' distributions by gender, age, resolution, and other experiments' details and benchmarks.
This deep learning model is used to transform incorrect and non-standard addresses into standardized addresses. Address standardization is a process of formatting and correcting addresses in accordance with global standards. It includes all the required address elements (i.e., street number, apartment number, street name, city, state, and postal) and is used by the standard postal service.
An address can be termed as non-standard because of incomplete details (missing street name or zip code), invalid information (incorrect address), incorrect information (typos, misspellings, formatting of abbreviations), or inaccurate information (wrong house number or street name). These errors make it difficult to locate a destination. Although a standardized address does not guarantee the address validity, it simply converts an address into the correct format. This deep learning model is trained on address dataset provided by openaddresses.io and can be used to standardize addresses from 10 different countries.
Using the model
Follow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.
Fine-tuning the modelThis model can be fine-tuned using the Train Deep Learning Model tool. Follow the guide to fine-tune this model.Input
Text (non-standard address) on which address standardization will be performed.
Output
Text (standard address)
Supported countries
This model supports addresses from the following countries:
AT – Austria
AU – Australia
CA – Canada
CH – Switzerland
DK – Denmark
ES – Spain
FR – France
LU – Luxemburg
SI – Slovenia
US – United States
Model architecture
This model uses the T5-base architecture implemented in Hugging Face Transformers.
Accuracy metrics
This model has an accuracy of 90.18 percent.
Training dataThe model has been trained on openly licensed data from openaddresses.io.Sample results
Here are a few results from the model.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dr. Kevin Bronson provides this dataset representing the first of three consecutive years of cotton and nitrogen management experimentation in Field 113. Included, is an intermediate analysis mega-table of correlated and calculated parameters, laboratory analysis results generated during the experimentation, plus high-resolution plot level intermediate data analysis tables of SAS process output, as well as the complete raw data sensor recorded logger outputs.
Experimental design and operational details of research conducted are contained in related published articles, however a further description of the measured data signals and commentary is herein offered.
Typical nitrogen fertilizer was delivered as liquid UAN 32-0-0 fertilizer with a density of 11.1 pounds per gallon, which contains 3.5 pounds of nitrogen per gallon. Notably, subsequent 2017 and 2018 experimentation years include a large volume of depleted nitrogen-15 isotope recovery tracing.
GeoScoutX logging of CropCircle active optical reflectance sensing data -
The primary component of this dataset is the Holland Scientific (HS) CropCircle ACS-470 generated reflectance numbers. Which as derived here, consists of raw active optical band-pass values digitized onboard the sensor product. Data was delivered as sequential serialized text output including the associated GPS information. Typically, this product examples a production agriculture support technology, enabling an efficient precision application of nitrogen fertilizer. However, we used this optical reflectance sensor technology to investigate plant agronomic biology, as the ACS-470 is a unique performance product, being not only rugged and reliable, but illumination active and filter customizable.
Individualized ACS-470 sensor detector behavior and subsequent index calculation influence can be understood through analysis of white-panel and other known target measurements. When a sensor is held 120 cm from and flush facing to a titanium dioxide white painted panel, a normalized unity value of 1.0 can be set for each detector. To generate this dataset, we used a Holland Scientific SC-1 device and set the 1.0 unity value (field normalization) on each sensor individually, before each data collection, and without the use of any channel gain boost. The SC-1 field normalization device allows a communications connection to a Windows PC machine, where company provided sensor control software enables the necessary sensor normalization routine, and a real-time view of streaming sensor data.
Noting that this type of raw value active proximal multi-spectral reflectance data may be perceived as inherently “noisy”; however basic analytical description consistently resolves a biological patterning, and more advanced statistical analysis is suggested to achieve discovery. Sources of polychromatic reflectance are inherent in the environment; and can be influenced by surface features like wax or water, or presence of crystal mineralization; varying bi-directional reflectance in the proximal space is a model reality and directed energy emission reflection sampling is expected to support physical understanding of the underling passive environmental system. We consider these CropCircle raw detector returns to be more “instant” in generation, and “less-filtered” electronically while onboard the “black-box” device, than are other reflectance products which produce vegetation indices that are averages of multiple detector samples.
Soil in view of the sensor does decrease the raw detection amplitude of the target color returned and can add a soil reflection signal component. Yet that return accurately represents a largely two-dimensional cover and intensity signal of the target material present within the field view. It does not represent a reflection of the plant material solely, because it can contain additional features in the view.
Expect NDVI values greater than 0.1 when sensing plants and saturating more around 0.8, rather than the typical 0.9 of passive NDVI; because the active light source does not transmit energy to penetrate perhaps past LAI 2.1, which is less than what is expected with a solar induced passive reflectance sensor. However, the focus of the active sensor scan is orientated on the uppermost expanded canopy leaves, and those leaves are normally positioned to intercept the major of incoming solar energy. Active energy sensors are easier to direct, where this capture method targets a consistent sensor height of 1 m above the average canopy height, and a roaming travel speed maintained around 1.5 mph, with the sensors parallel to earth in a nadir view.
Holland Scientific 5Hz CropCircle active optical reflectance ACS-470 sensors that were measured on the GeoScoutX digital propriety serial data logger, have a stable output format as defined by firmware version.
Different numbers of csv data files were generated based on field operations. Raw data files include the inserted null value placeholder -9999. CropCircle sensors supplied data in a lined format, where variables were repeated for each sensor creating a discrete data row for each individual sensor measurement instance.
Hamby rig active optical reflectance data was generated by Holland Scientific CropCircle ACS-470 sensors, which were numbered 1, 2, 4 and 5, where sensors 1 and 5 had band-pass filters centered at nanometer frequency 550, 670, and 530, while sensors 2 and 4 had filters 590, 800, and 730, each for their respective R1, R2, and R3 raw detector data channels. The placement of the filters was determined as a generic optimization, where the longer wave filter was put in the middle detector position, and where the tandem sensor setup was optimized for the favored NDRE on one sensor and a green frequency test configuration on the other. Although when facing forward, there is a left and right side for the two cotton rows measured and where data was tracked and processed accordingly, the two cotton rows were not considered different experimentally. Therefore the possible two crop row variability was not considered.
CropCircle raw data adjustment approaches -
Three undescribed adjustment value test calculation data columns are included, appended to the original raw data tables. For each CC sensor detector, the white panel observed amplitude delta of the raw reflectance channel was used to create minor data adjustments. This calculated test value was appended to the raw data table as variables R1_adj, R2_adj and R3_adj, and example a possible raw data minor adjustment.
This was the beginning period of a method advancement, in testing control based normalization adjustments to raw active optical detector data values. Generic and course post-process raw data adjustments can be made by first measuring a white panel reference at 120 cm distant, before and / or after a data collection period, which is beyond using only the SC-1 device to normalize individual sensor detectors. A deviation from the flat white reflectance typical 1.0 unity value was recorded and used to offset the detector raw radiance values.
The raw data adjustment test approach was developed as an extension of the manufacturer’s normalization routine recommendation, which uses the SC-1 device or a titanium dioxide ultra-white painted custom panel. Normally, the ACS-470 detector channels would be set to read 1.0, after 30 minutes of warmup time and when connected to the SC-1 illumination reflector, or when held 120 cm away from and facing an optically flat white panel of sufficient size to fully reflect the active light signal footprint (about 30 x 100 cm). This recommended approach does work well.
We normalized multiple sensors in a field condition, by using the typical two tailed white panel field-normalization approach. One by one, each sensor was connected to the SC-1 box for communications with a PC, where the sensor real-time information was viewed and a sensor normalization command given. Once placed at an appropriate height and position relative to the white panel, a sensor zero point as measured was ascribed to the sensor configuration, by first covering the active optical LEDs source and detectors and creating a black-out condition, and then immediately afterward revealing the illuminated white panel in full detector view where a second full signal measurement was made and the unity 1.0 set point value instructed to the sensor.
Values streaming through the active optical sensor detectors typical range 0-2% around the unity value after field normalization, while in a natural condition measuring the course surface white control panel. Therefore successful normalization was deemed to have occurred, or was not needed, when all detectors were within 2% of the 1.0 value when using the white panel setup. It was difficult to achieve a 1% data range for all the detectors at all times, where multiple iterations of the normalization routine would not consistently yield improved results of 1% magnitude.
Therefore, we simply measured the typical 0-2% raw data value difference for each detector, with the idea that a subsequent adjustment may be possible. We found that we could measure longer time periods with sensors over a white panel reference and determine optical signal features, as well as elucidate individual sensor minor behavior. Temperature change apparently induced effects on the raw detector data stream. We also recorded the sensors when connected optically to the SC-1 device reflector, in dark conditions, at various distances and angles from a target, and with many different types of target reflectors, in a temperature control room, laboratory, shop and outdoors.
We noted that each detector of each sensor can exhibit unique behaviors, which underlie the customizable band-pass color filter’s effect. Some detector channels
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
T-Test analysis of the difference of the correlation coefficients observed among each of the normalization categories.
This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Ammonium (NH4) for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of NH4 deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IT application called aLink. File fusion tool, which combines a series of techniques in different stages to carry out a process of file fusion of large volumes of data. In addition to allowing to link files with probabilistic processes through common variables, it also allows to normalize variables that contain postal addresses, names and surnames of people and DNI, NIF or NIE (Foreigner Identification Number).
enoubi/twitter-normalize-and-clean-repeat-masking dataset hosted on Hugging Face and contributed by the HF Datasets community
This dataset presents information on historical central government revenues for 31 countries in Europe and the Americas for the period from 1800 (or independence) to 2012. The countries included are: Argentina, Australia, Austria, Belgium, Bolivia, Brazil, Canada, Chile, Colombia, Denmark, Ecuador, Finland, France, Germany (West Germany between 1949 and 1990), Ireland, Italy, Japan, Mexico, New Zealand, Norway, Paraguay, Peru, Portugal, Spain, Sweden, Switzerland, the Netherlands, the United Kingdom, the United States, Uruguay, and Venezuela. In other words, the dataset includes all South American, North American, and Western European countries with a population of more than one million, plus Australia, New Zealand, Japan, and Mexico. The dataset contains information on the public finances of central governments. To make such information comparable cross-nationally we have chosen to normalize nominal revenue figures in two ways: (i) as a share of the total budget, and (ii) as a share of total gross domestic product. The total tax revenue of the central state is disaggregated guided by the Government Finance Statistics Manual 2001 of the International Monetary Fund (IMF) which provides a classification of types of revenue, and describes in detail the contents of each classification category. Given the paucity of detailed historical data and the needs of our project, we combined some subcategories. First, we are interested in total tax revenue (centaxtot), as well as the shares of total revenue coming from direct (centaxdirectsh) and indirect (centaxindirectsh) taxes. Further, we measure two sub-categories of direct taxation, namely taxes on property (centaxpropertysh) and income (centaxincomesh). For indirect taxes, we separate excises (centaxexcisesh), consumption (centaxconssh), and customs(centaxcustomssh).
For a more detailed description of the dataset and the coding process, see the codebook available in the .zip-file.
Purpose:
This dataset presents information on historical central government revenues for 31 countries in Europe and the Americas for the period from 1800 (or independence) to 2012. The countries included are: Argentina, Australia, Austria, Belgium, Bolivia, Brazil, Canada, Chile, Colombia, Denmark, Ecuador, Finland, France, Germany (West Germany between 1949 and 1990), Ireland, Italy, Japan, Mexico, New Zealand, Norway, Paraguay, Peru, Portugal, Spain, Sweden, Switzerland, the Netherlands, the United Kingdom, the United States, Uruguay, and Venezuela. In other words, the dataset includes all South American, North American, and Western European countries with a population of more than one million, plus Australia, New Zealand, Japan, and Mexico. The dataset contains information on the public finances of central governments. To make such information comparable cross-nationally we have chosen to normalize nominal revenue figures in two ways: (i) as a share of the total budget, and (ii) as a share of total gross domestic product. The total tax revenue of the central state is disaggregated guided by the Government Finance Statistics Manual 2001 of the International Monetary Fund (IMF) which provides a classification of types of revenue, and describes in detail the contents of each classification category. Given the paucity of detailed historical data and the needs of our project, we combined some subcategories. First, we are interested in total tax revenue (centaxtot), as well as the shares of total revenue coming from direct (centaxdirectsh) and indirect (centaxindirectsh) taxes. Further, we measure two sub-categories of direct taxation, namely taxes on property (centaxpropertysh) and income (centaxincomesh). For indirect taxes, we separate excises (centaxexcisesh), consumption (centaxconssh), and customs(centaxcustomssh).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This metadata record and it’s attached files make statements about the kinds of data collected as part of this research, and set out policies for governance of that data, now and in the future.Description: The Kiwifruit Ingestion to Normalise Gut Symptoms (KINGS) study was launched to understand more about the clinical, psychological, biological, and dietary changes in individuals with Functional Constipation (FC) or Constipation-Predominant Irritable Bowel Syndrome (IBS/C). We believe that the approach used in this study can help generate new knowledge beyond that already reported for the consumption of green kiwifruit. The Auckland KINGS Gastric is a second locality for the KINGS in the Christchurch IBS cohort (ACTRN12621000621819), with site-specific study design and secondary outcomes. In this single arm, open-label, intervention study, we aim to assess the impact of habitual consumption of two Green Kiwifruit daily for 4 weeks on abdominal pain, bowel habits and other gastrointestinal symptoms in individuals with FC/IBS-C; and to evaluate the effects of habitual consumption of Green Kiwifruit on digestive function and biomarkers of these physiological effects. Additionally, we hope to investigate the influence of the two Green Kiwifruit intervention on the microbiome and metabolome, and to compare differences between individuals with constipation and healthy controls. The trial will be a maximum of up to 8 weeks in total.
This Level 1 (L1) dataset contains the Version 2.1 geo-located Delay Doppler Maps (DDMs) calibrated into Power Received (Watts) and Bistatic Radar Cross Section (BRCS) expressed in units of meters squared from the Delay Doppler Mapping Instrument aboard the CYGNSS satellite constellation. This version supersedes Version 2.0. Other useful scientific and engineering measurement parameters include the DDM of Normalized Bistatic Radar Cross Section (NBRCS), the Delay Doppler Map Average (DDMA) of the NBRCS near the specular reflection point, and the Leading Edge Slope (LES) of the integrated delay waveform. The L1 dataset contains a number of other engineering and science measurement parameters, including sets of quality flags/indicators, error estimates, and bias estimates as well as a variety of orbital, spacecraft/sensor health, timekeeping, and geolocation parameters. At most, 8 netCDF data files (each file corresponding to a unique spacecraft in the CYGNSS constellation) are provided each day; under nominal conditions, there are typically 6-8 spacecraft retrieving data each day, but this can be maximized to 8 spacecraft under special circumstances in which higher than normal retrieval frequency is needed (i.e., during tropical storms and or hurricanes). Latency is approximately 6 days (or better) from the last recorded measurement time. The Version 2.1 release represents the second science-quality release. Here is a summary of improvements that reflect the quality of the Version 2.1 data release: 1) data is now available when the CYGNSS satellites are rolled away from nadir during orbital high beta-angle periods, resulting in a significant amount of additional data; 2) correction to coordinate frames result in more accurate estimates of receiver antenna gain at the specular point; 3) improved calibration for analog-to-digital conversion results in better consistency between CYGNSS satellites measurements at nearly the same location and time; 4) improved GPS EIRP and transmit antenna pattern calibration results in significantly reduced PRN-dependence in the observables; 5) improved estimation of the location of the specular point within the DDM; 6) an altitude-dependent scattering area is used to normalize the scattering cross section (v2.0 used a simpler scattering area model that varied with incidence and azimuth angles but not altitude); 7) corrections added for noise floor-dependent biases in scattering cross section and leading edge slope of delay waveform observed in the v2.0 data. Users should also note that the receiver antenna pattern calibration is not applied per-DDM-bin in this v2.1 release.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset published by Luecken et al. 2021 which contains data from human bone marrow measured through joint profiling of single-nucleus RNA and Antibody-Derived Tags (ADTs) using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0.File Descriptioncite_quality_control.h5mu: Filtered cell by feature MuData object after quality control.cite_normalization.h5mu: MuData object of normalized data using DSB (denoised and scaled by background) normalization.cite_doublet_removal_xdbt.h5mu: MuData of data after doublet removal based on known cell type markers. Cells were removed if they were double positive for mutually exclusive markers with a DSB value >2.5.cite_dimensionality_reduction.h5mu: MuData of data after dimensionality reduction.cite_batch_correction.h5mu: MuData of data after batch correction.CitationLuecken, M. D. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2021).Original data linkhttps://openproblems.bio/neurips_docs/data/dataset/