36 datasets found

A
‘The Bronson Files, Dataset 4, Field 105, 2013’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 1, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2013). ‘The Bronson Files, Dataset 4, Field 105, 2013’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-the-bronson-files-dataset-4-field-105-2013-7c96/latest
Explore at:
Dataset updated
Aug 1, 2013
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘The Bronson Files, Dataset 4, Field 105, 2013’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/392f69f2-aa43-4e90-970d-33c36e011c19 on 11 February 2022.

--- Dataset description provided by original source is as follows ---

Dr. Kevin Bronson provides this unique nitrogen and water management in wheat agricultural research dataset for compute. Ten irrigation treatments from a linear sprinkler were combined with nitrogen treatments. This dataset includes notation of field events and operations, an intermediate analysis mega-table of correlated and calculated parameters, including laboratory analysis results generated during the experimentation, plus high resolution plot level intermediate data tables of SAS process output, as well as the complete raw sensors records and logger outputs.

This data was collected during the beginning time period of our USDA Maricopa terrestrial proximal high-throughput plant phenotyping tri-metric method generation, where a 5Hz crop canopy height, temperature and spectral signature are recorded coincident to indicate a plant health status. In this early development period, our Proximal Sensing Cart Mark1 (PSCM1) platform supplants people carrying the CropCircle (CC) sensors, and with an improved view mechanical performance result.

Experimental design and operational details of research conducted are contained in related published articles, however further description of the measured data signals as well as germane commentary is herein offered.

The primary component of this dataset is the Holland Scientific (HS) CropCircle ACS-470 reflectance numbers. Which as derived here, consist of raw active optical band-pass values, digitized onboard the sensor product. Data is delivered as sequential serialized text output including the associated GPS information. Typically this is a production agriculture support technology, enabling an efficient precision application of nitrogen fertilizer. We used this optical reflectance sensor technology to investigate plant agronomic biology, as the ACS-470 is a unique performance product being not only rugged and reliable but illumination active and filter customizable.

Individualized ACS-470 sensor detector behavior and subsequent index calculation influence can be understood through analysis of white-panel and other known target measurements. When a sensor is held 120cm from a titanium dioxide white painted panel, a normalized unity value of 1.0 is set for each detector. To generate this dataset we used a Holland Scientific SC-1 device and set the 1.0 unity value (field normalize) on each sensor individually, before each data collection, and without using any channel gain boost. The SC-1 field normalization device allows a communications connection to a Windows machine, where company provided sensor control software enables the necessary sensor normalization routine, and a real-time view of streaming sensor data.

This type of active proximal multi-spectral reflectance data may be perceived as inherently “noisy”; however basic analytical description consistently resolves a biological patterning, and more advanced statistical analysis is suggested to achieve discovery. Sources of polychromatic reflectance are inherent in the environment; and can be influenced by surface features like wax or water, or presence of crystal mineralization; varying bi-directional reflectance in the proximal space is a model reality, and directed energy emission reflection sampling is expected to support physical understanding of the underling passive environmental system.

Soil in view of the sensor does decrease the raw detection amplitude of the target color returned and can add a soil reflection signal component. Yet that return accurately represents a largely two-dimensional cover and intensity signal of the target material present within each view. It does however, not represent a reflection of the plant material solely because it can contain additional features in view. Expect NDVI values greater than 0.1 when sensing plants and saturating more around 0.8, rather than the typical 0.9 of passive NDVI.

The active signal does not transmit energy to penetrate, perhaps past LAI 2.1 or less, compared to what a solar induced passive reflectance sensor would encounter. However the focus of our active sensor scan is on the uppermost expanded canopy leaves, and they are positioned to intercept the major solar energy. Active energy sensors are more easy to direct, and in our capture method we target a consistent sensor height that is 1m above the average canopy height, and maintaining a rig travel speed target around 1.5 mph, with sensors parallel to earth ground in a nadir view.

We consider these CropCircle raw detector returns to be more “instant” in generation, and “less-filtered” electronically, while onboard the “black-box” device, than are other reflectance products which produce vegetation indices as averages of multiple detector samples in time.

It is known through internal sensor performance tracking across our entire location inventory, that sensor body temperature change affects sensor raw detector returns in minor and undescribed yet apparently consistent ways.

Holland Scientific 5Hz CropCircle active optical reflectance ACS-470 sensors, that were measured on the GeoScout digital propriety serial data logger, have a stable output format as defined by firmware version.

Different numbers of csv data files were generated based on field operations, and there were a few short duration instances where GPS signal was lost, multiple raw data files when present, including white panel measurements before or after field collections, were combined into one file, with the inclusion of the null value placeholder -9999. Two CropCircle sensors, numbered 2 and 3, were used supplying data in a lined format, where variables are repeated for each sensor, creating a discrete data row for each individual sensor measurement instance.

We offer six high-throughput single pixel spectral colors, recorded at 530, 590, 670, 730, 780, and 800nm. The filtered band-pass was 10nm, except for the NIR, which was set to 20 and supplied an increased signal (including increased noise).

Dual, or tandem, CropCircle sensor paired usage empowers additional vegetation index calculations such as:
DATT = (r800-r730)/(r800-r670)
DATTA = (r800-r730)/(r800-r590)
MTCI = (r800-r730)/(r730-r670)
CIRE = (r800/r730)-1
CI = (r800/r590)-1
CCCI = NDRE/NDVIR800
PRI = (r590-r530)/(r590+r530)
CI800 = ((r800/r590)-1)
CI780 = ((r780/r590)-1)

The Campbell Scientific (CS) environmental data recording of small range (0 to 5 v) voltage sensor signals are accurate and largely shielded from electronic thermal induced influence, or other such factors by design. They were used as was descriptively recommended by the company. A high precision clock timing, and a recorded confluence of custom metrics, allow the Campbell Scientific raw data signal acquisitions a high research value generally, and have delivered baseline metrics in our plant phenotyping program. Raw electrical sensor signal captures were recorded at the maximum digital resolution, and could be re-processed in whole, while the subsequent onboard calculated metrics were often data typed at a lower memory precision and served our research analysis.

Improved Campbell Scientific data at 5Hz is presented for nine collection events, where thermal, ultrasonic displacement, and additional GPS metrics were recorded. Ultrasonic height metrics generated by the Honeywell sensor and present in this dataset, represent successful phenotypic recordings. The Honeywell ultrasonic displacement sensor has worked well in this application because of its 180Khz signal frequency that ranges 2m space. Air temperature is still a developing metric, a thermocouple wire junction (TC) placed in free air with a solar shade produced a low-confidence passive ambient air temperature.

Campbell Scientific logger derived data output is structured in a column format, with multiple sensor data values present in each data row. One data row represents one program output cycle recording across the sensing array, as there was no onboard logger data averaging or down sampling. Campbell Scientific data is first recorded in binary format onboard the data logger, and then upon data retrieval, converted to ASCII text via the PC based LoggerNet CardConvert application. Here, our full CS raw data output, that includes a four-line header structure, was truncated to a typical single row header of variable names. The -9999 placeholder value was inserted for null instances.

There is canopy thermal data from three view vantages. A nadir sensor view, and looking forward and backward down the plant row at a 30 degree angle off nadir. The high confidence Apogee Instruments SI-111 type infrared radiometer, non-contact thermometer, serial number 1052 was in a front position looking forward away from the platform, number 1023 with a nadir view was in middle position, and sensor number 1022 was in a rear position and looking back toward the platform frame, until after 4/10/2013 when the order was reversed. We have a long and successful history testing and benchmarking performance, and deploying Apogee Instruments infrared radiometers in field experimentation. They are biologically spectral window relevant sensors and return a fast update 0.2C accurate average surface temperature, derived from what is (geometrically weighted) in their field of view.

Data gaps do exist beyond null value -9999 designations, there are some instances when GPS signal was lost, or rarely on HS GeoScout logger error. GPS information may be missing at the start of data recording.
f
Identification of Novel Reference Genes Suitable for qRT-PCR Normalization...
plos.figshare.com
tiff
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu Hu; Shuying Xie; Jihua Yao (2023). Identification of Novel Reference Genes Suitable for qRT-PCR Normalization with Respect to the Zebrafish Developmental Stage [Dataset]. http://doi.org/10.1371/journal.pone.0149277
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0149277
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Yu Hu; Shuying Xie; Jihua Yao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reference genes used in normalizing qRT-PCR data are critical for the accuracy of gene expression analysis. However, many traditional reference genes used in zebrafish early development are not appropriate because of their variable expression levels during embryogenesis. In the present study, we used our previous RNA-Seq dataset to identify novel reference genes suitable for gene expression analysis during zebrafish early developmental stages. We first selected 197 most stably expressed genes from an RNA-Seq dataset (29,291 genes in total), according to the ratio of their maximum to minimum RPKM values. Among the 197 genes, 4 genes with moderate expression levels and the least variation throughout 9 developmental stages were identified as candidate reference genes. Using four independent statistical algorithms (delta-CT, geNorm, BestKeeper and NormFinder), the stability of qRT-PCR expression of these candidates was then evaluated and compared to that of actb1 and actb2, two commonly used zebrafish reference genes. Stability rankings showed that two genes, namely mobk13 (mob4) and lsm12b, were more stable than actb1 and actb2 in most cases. To further test the suitability of mobk13 and lsm12b as novel reference genes, they were used to normalize three well-studied target genes. The results showed that mobk13 and lsm12b were more suitable than actb1 and actb2 with respect to zebrafish early development. We recommend mobk13 and lsm12b as new optimal reference genes for zebrafish qRT-PCR analysis during embryogenesis and early larval stages.
A
Data from: The Bronson Files, Dataset 6, Field 13, 2014
data.amerigeoss.org
agdatacommons.nal.usda.gov
+2more
csv, jpeg, pdf, xls +1
Updated Aug 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2022). The Bronson Files, Dataset 6, Field 13, 2014 [Dataset]. https://data.amerigeoss.org/fi/dataset/groups/the-bronson-files-dataset-6-field-13-2014-9dd3f
Explore at:
zip, csv, xls, jpeg, pdfAvailable download formats
Dataset updated
Aug 24, 2022
Dataset provided by
United States
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dr. Kevin Bronson provides a unique nitrogen and water management in cotton agricultural research dataset for compute, including notation of field events and operations, an intermediate analysis mega-table of correlated and calculated parameters, and laboratory analysis results generated during the experimentation, plus high-resolution plot level intermediate data analysis tables of SAS process output, as well as the complete raw data sensor recorded logger outputs.

This data was collected using a Hamby rig as a high-throughput proximal plant phenotyping platform.

The Hamby 6000 rig
Ellis W. Chenault, & Allen F. Wiese. (1989). Construction of a High-Clearance Plot Sprayer. Weed Technology, 3(4), 659–662. http://www.jstor.org/stable/3987560

Dr. Bronson modified an old high-clearance Hamby 6000 rig, adding a tank and pump with a rear boom, to perform precision liquid N applications. A Raven control unit with GPS supplied variable rate delivery options.

The 12 volt Holland Scientific GeoScoutX data recorder and associated CropCircle ACS-470 sensors with GPS signal, was easy to mount and run on the vehicle as an attached rugged data acquisition module, and allowed the measuring of plants using custom proximal active optical reflectance sensing. The HS data logger was positioned near the operator, and sensors were positioned in front of the rig, on forward protruding armature attached to a hydraulic front boom assembly, facing downward in nadir view 1 m above the average canopy height. A 34-size class AGM battery sat under the operator and provided the data system electrical power supply.

Data suffered reduced input from Conley. Although every effort was afforded to capture adequate quality across all metrics, experiment exterior considerations were such that canopy temperature data is absent, and canopy height is weak due to technical underperformance. Thankfully, reflectance data quality was maintained or improved through the implementation of new hardware by Bronson.

Experimental design and operational details of research conducted are contained in related published articles, however a further description of the measured data signals and commentary is herein offered.

The primary component of this dataset is the Holland Scientific (HS) CropCircle ACS-470 reflectance numbers. Which as derived here, consist of raw active optical band-pass values, digitized onboard the sensor product. Data is delivered as sequential serialized text output including the associated GPS information. Typically, this is a production agriculture support technology, enabling an efficient precision application of nitrogen fertilizer. We used this optical reflectance sensor technology to investigate plant agronomic biology, as the ACS-470 is a unique performance product being not only rugged and reliable but illumination active and filter customizable.

Individualized ACS-470 sensor detector behavior and subsequent index calculation influence can be understood through analysis of white-panel and other known target measurements. When a sensor is held 120 cm from and flush facing a titanium dioxide white painted panel, a normalized unity value of 1.0 can be set for each detector. To generate this dataset, we used a Holland Scientific SC-1 device and set the 1.0 unity value (field normalize) on each sensor individually, before each data collection, and without using any channel gain boost. The SC-1 field normalization device allows a communications connection to a Windows machine, where company provided sensor control software enables the necessary sensor normalization routine, and a real-time view of streaming sensor data.

Noting that this type of active proximal multi-spectral reflectance data may be perceived as inherently “noisy”; however basic analytical description consistently resolves a biological patterning, and more advanced statistical analysis is suggested to achieve discovery. Sources of polychromatic reflectance are inherent in the environment; and can be influenced by surface features like wax or water, or presence of crystal mineralization; varying bi-directional reflectance in the proximal space is a model reality and directed energy emission reflection sampling is expected to support physical understanding of the underling passive environmental system.

Soil in view of the sensor does decrease the raw detection amplitude of the target color returned and can add a soil reflection signal component. Yet that return accurately represents a largely two-dimensional cover and intensity signal of the target material present within each view. It does not represent a reflection of the plant material solely, because it can contain additional features in view. Expect NDVI values greater than 0.1 when sensing plants and saturating more around 0.8, rather than the typical 0.9 of passive NDVI.

The active signal does not transmit energy to penetrate, perhaps past LAI 2.1 or less, compared to what a solar induced passive reflectance sensor would encounter. However, the focus of our active sensor scan is on the uppermost expanded canopy leaves, and they are positioned to intercept the major solar energy. Active energy sensors are easier to direct, and in our capture method we target a consistent sensor height that is 1 m above the average canopy height, and maintaining a rig travel speed target around 1.5 mph, with sensors parallel to earth ground in a nadir view.

We consider these CropCircle raw detector returns to be more “instant” in generation, and “less-filtered” electronically, while onboard the “black-box” device, than are other reflectance products which produce vegetation indices as averages of multiple detector samples in time.

It is known through internal sensor performance tracking across our entire location inventory, that sensor body temperature change affects sensor raw detector returns in minor and undescribed yet apparently consistent ways.

Holland Scientific 5Hz CropCircle active optical reflectance ACS-470 sensors, that were measured on the GeoScout digital propriety serial data logger, have a stable output format as defined by firmware version.

Different numbers of csv data files were generated based on field operations, and there were a few short duration instances where GPS signal was lost, multiple raw data files when present, including white panel measurements before or after field collections, were combined into one file, with the inclusion of the null value placeholder -9999. Two CropCircle sensors, numbered 2 and 3, were used supplying data in a lined format, where variables are repeated for each sensor, creating a discrete data row for each individual sensor measurement instance.

We offer five high-throughput single pixel spectral colors, recorded at 530, 550, 590, 670, 730, and 800nm (NIR). The filtered band-pass was 10nm, except for the NIR, which was set to 20 and supplied an increased signal (including increased noise). Importantly, two green frequencies are available in this study, which is different from the alternate focus on the other side of the spectrum in the first two Bronson Files datasets measuring cotton.

Dual, or tandem, CropCircle sensor paired usage empowers additional vegetation index calculations such as:
DATT = (r800-r730)/(r800-r670)
DATTA = (r800-r730)/(r800-r590)
MTCI = (r800-r730)/(r730-r670)
CIRE = (r800/r730)-1
CI = (r800/r590)-1
CCCI = NDRE/NDVIR800
PRI = (r590-r530)/(r590+r530)
CI800 = ((r800/r590)-1)
CI780 = ((r780/r590)-1)

On collection 7/28/2014 and thereafter, a new HS logger, the GeoScoutX, or GSX, was initiated. The upgraded data recorder increased operational reliability by eliminating recording stops and subsequent multiple data files. The new data outputs were defined by the operating system configuration version where the data variables column headers were changed to be named SF00, SF01, SF02, SF03 and SF04. The raw reflectance columns are the first three, SF00-02, and the last two columns are the onboard calculated VIs, which we did not consider.

The Campbell Scientific (CS) environmental data recording of small range (0 to 5 v) voltage sensor signals are accurate and largely shielded from electronic thermal induced influence, or other such factors by design. They were used as was descriptively recommended by the company. A high precision clock timing, and a recorded confluence of custom metrics, allow the Campbell Scientific raw data signal acquisitions a high research value generally, and have delivered baseline metrics in our plant phenotyping program. Raw electrical sensor signal captures were recorded at the maximum digital resolution, and could be re-processed in whole, while the subsequent onboard calculated metrics were often data typed at a lower memory precision and served our research analysis.

Campbell Scientific logger derived data output is structured in a column format, with multiple sensor data values present in each data row. One data row represents one program output cycle recording across the sensing array, as there was no onboard logger data averaging or down sampling. Campbell Scientific data is first recorded in binary format onboard the data logger, and then upon data retrieval, converted to ASCII text via the PC based LoggerNet CardConvert application. Here, our full CS raw data output, that includes a four-line header structure, was truncated to a typical single row header of variable names. The -9999 placeholder value was inserted for null instances.

This second data component, expanding measurement using Campbell Scientific records and additional sensors was added during the season. However unfortunate, the CS data of this dataset is of poor quality. The IRT sensors that were
P
Lego Road image for Drone Dataset
paperswithcode.com
gts.ai
Updated May 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Lego Road image for Drone Dataset [Dataset]. https://paperswithcode.com/dataset/lego-road-image-for-drone
Explore at:
Dataset updated
May 30, 2025
Description
Description:

👉 Download the dataset here

This folder contains three distinct types of images representing various movements: forward, left, and right. Our task involves reading these images into a digital array, which will enable us to analyze and predict the next action of the drone.

Download Dataset

Here’s a detailed breakdown of the process:

Image Categorization:

Forward Movement: These images depict scenarios where the drone needs to move straight ahead.

Left Movement: These images illustrate situations where the drone needs to turn or move to the left.

Right Movement: These images show instances where the drone should turn or move to the right.

Image Preprocessing:

Reading Images: Use an appropriate library to read and load the images from the folder into a digital format.

Resizing and Normalizing: Standardize the size of the images and normalize the pixel values to ensure consistent input for the analysis model.

Labeling: Assign labels to each image based on its movement category (forward, left, right).

Data Storage:

Store the preprocessed images in a digital array, with each image associated with its respective label. This array will serve as the dataset for further analysis.

Analysis and Prediction:

Feature Extraction: Extract relevant features from the images, such as edges, shapes, and textures, which are indicative of the movement type.

Model Training: Use machine learning algorithms to train a predictive model on the labeled dataset. This model will learn to recognize patterns corresponding to each movement type.

Prediction: Utilize the trained model to analyze new images and predict the drone’s next action based on the detected patterns.

Applications:

Real-Time Navigation: Implement the predictive model in the drone’s navigation system to enable real-time decision-making based on visual inputs.

Autonomous Operations: Enhance the drone’s autonomous capabilities by integrating the model with other sensors and control systems for seamless movement.

This dataset is sourced from Kaggle.
d
Residential Existing Homes (One to Four Units) Energy Efficiency Meter...
catalog.data.gov
datasets.ai
+3more
Updated Sep 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ny.gov (2023). Residential Existing Homes (One to Four Units) Energy Efficiency Meter Evaluated Project Data: 2007 – 2012 [Dataset]. https://catalog.data.gov/dataset/residential-existing-homes-one-to-four-units-energy-efficiency-meter-evaluated-projec-2007
Explore at:
Dataset updated
Sep 15, 2023
Dataset provided by
data.ny.gov
Description
IMPORTANT! PLEASE READ DISCLAIMER BEFORE USING DATA. This dataset backcasts estimated modeled savings for a subset of 2007-2012 completed projects in the Home Performance with ENERGY STAR® Program against normalized savings calculated by an open source energy efficiency meter available at https://www.openee.io/. Open source code uses utility-grade metered consumption to weather-normalize the pre- and post-consumption data using standard methods with no discretionary independent variables. The open source energy efficiency meter allows private companies, utilities, and regulators to calculate energy savings from energy efficiency retrofits with increased confidence and replicability of results. This dataset is intended to lay a foundation for future innovation and deployment of the open source energy efficiency meter across the residential energy sector, and to help inform stakeholders interested in pay for performance programs, where providers are paid for realizing measurable weather-normalized results. To download the open source code, please visit the website at https://github.com/openeemeter/eemeter/releases D I S C L A I M E R: Normalized Savings using open source OEE meter. Several data elements, including, Evaluated Annual Elecric Savings (kWh), Evaluated Annual Gas Savings (MMBtu), Pre-retrofit Baseline Electric (kWh), Pre-retrofit Baseline Gas (MMBtu), Post-retrofit Usage Electric (kWh), and Post-retrofit Usage Gas (MMBtu) are direct outputs from the open source OEE meter. Home Performance with ENERGY STAR® Estimated Savings. Several data elements, including, Estimated Annual kWh Savings, Estimated Annual MMBtu Savings, and Estimated First Year Energy Savings represent contractor-reported savings derived from energy modeling software calculations and not actual realized energy savings. The accuracy of the Estimated Annual kWh Savings and Estimated Annual MMBtu Savings for projects has been evaluated by an independent third party. The results of the Home Performance with ENERGY STAR impact analysis indicate that, on average, actual savings amount to 35 percent of the Estimated Annual kWh Savings and 65 percent of the Estimated Annual MMBtu Savings. For more information, please refer to the Evaluation Report published on NYSERDA’s website at: http://www.nyserda.ny.gov/-/media/Files/Publications/PPSER/Program-Evaluation/2012ContractorReports/2012-HPwES-Impact-Report-with-Appendices.pdf. This dataset includes the following data points for a subset of projects completed in 2007-2012: Contractor ID, Project County, Project City, Project ZIP, Climate Zone, Weather Station, Weather Station-Normalization, Project Completion Date, Customer Type, Size of Home, Volume of Home, Number of Units, Year Home Built, Total Project Cost, Contractor Incentive, Total Incentives, Amount Financed through Program, Estimated Annual kWh Savings, Estimated Annual MMBtu Savings, Estimated First Year Energy Savings, Evaluated Annual Electric Savings (kWh), Evaluated Annual Gas Savings (MMBtu), Pre-retrofit Baseline Electric (kWh), Pre-retrofit Baseline Gas (MMBtu), Post-retrofit Usage Electric (kWh), Post-retrofit Usage Gas (MMBtu), Central Hudson, Consolidated Edison, LIPA, National Grid, National Fuel Gas, New York State Electric and Gas, Orange and Rockland, Rochester Gas and Electric. How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.
f
LEMming: A Linear Error Model to Normalize Parallel Quantitative Real-Time...
plos.figshare.com
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronny Feuer; Sebastian Vlaic; Janine Arlt; Oliver Sawodny; Uta Dahmen; Ulrich M. Zanger; Maria Thomas (2023). LEMming: A Linear Error Model to Normalize Parallel Quantitative Real-Time PCR (qPCR) Data as an Alternative to Reference Gene Based Methods [Dataset]. http://doi.org/10.1371/journal.pone.0135852
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0135852
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Ronny Feuer; Sebastian Vlaic; Janine Arlt; Oliver Sawodny; Uta Dahmen; Ulrich M. Zanger; Maria Thomas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundGene expression analysis is an essential part of biological and medical investigations. Quantitative real-time PCR (qPCR) is characterized with excellent sensitivity, dynamic range, reproducibility and is still regarded to be the gold standard for quantifying transcripts abundance. Parallelization of qPCR such as by microfluidic Taqman Fluidigm Biomark Platform enables evaluation of multiple transcripts in samples treated under various conditions. Despite advanced technologies, correct evaluation of the measurements remains challenging. Most widely used methods for evaluating or calculating gene expression data include geNorm and ΔΔCt, respectively. They rely on one or several stable reference genes (RGs) for normalization, thus potentially causing biased results. We therefore applied multivariable regression with a tailored error model to overcome the necessity of stable RGs.ResultsWe developed a RG independent data normalization approach based on a tailored linear error model for parallel qPCR data, called LEMming. It uses the assumption that the mean Ct values within samples of similarly treated groups are equal. Performance of LEMming was evaluated in three data sets with different stability patterns of RGs and compared to the results of geNorm normalization. Data set 1 showed that both methods gave similar results if stable RGs are available. Data set 2 included RGs which are stable according to geNorm criteria, but became differentially expressed in normalized data evaluated by a t-test. geNorm-normalized data showed an effect of a shifted mean per gene per condition whereas LEMming-normalized data did not. Comparing the decrease of standard deviation from raw data to geNorm and to LEMming, the latter was superior. In data set 3 according to geNorm calculated average expression stability and pairwise variation, stable RGs were available, but t-tests of raw data contradicted this. Normalization with RGs resulted in distorted data contradicting literature, while LEMming normalized data did not.ConclusionsIf RGs are coexpressed but are not independent of the experimental conditions the stability criteria based on inter- and intragroup variation fail. The linear error model developed, LEMming, overcomes the dependency of using RGs for parallel qPCR measurements, besides resolving biases of both technical and biological nature in qPCR. However, to distinguish systematic errors per treated group from a global treatment effect an additional measurement is needed. Quantification of total cDNA content per sample helps to identify systematic errors.
a
County Hurr Risk
hub.arcgis.com
gis-fema.hub.arcgis.com
Updated Jun 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FEMA AGOL (2020). County Hurr Risk [Dataset]. https://hub.arcgis.com/maps/FEMA::county-hurr-risk
Explore at:
Dataset updated
Jun 1, 2020
Dataset authored and provided by
FEMA AGOL
Area covered

Description
The hurricane risk index is simply the product of the cumulative hurricane strikes per coastal county and the CDC Overall Social Vulnerability Index (SVI) for the given county. We normalize the hurricane strikes data to match the SVI data classification scheme (i.e., max value at 1); however, using the raw or normalized values of hurricane strikes has no impact on the spatial pattern of the risk index. Therefore, a risk index of value 1 indicates the county has the highest hurricane strikes of all the counties and is the most vulnerable county in the nation according to the SVI index. Because the analysis is over multiple states, we use the ‘United States’ SVI dataset at the county level. Values of the index are unevenly distributed so we classify intervals using the Jenks method and the first break at 0.08 is roughly equal to the median index value. For counties north of North Carolina, the low hurricane risk is most dependent on the low number of hurricane strikes. The vast majority of the counties fall in the lowest risk category and any in the second lowest category are there because of high social vulnerability.
HMS HBAC Train Spectrograms 2
kaggle.com
Updated Apr 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishal (2024). HMS HBAC Train Spectrograms 2 [Dataset]. https://www.kaggle.com/datasets/vishalbakshi/hms-hbac-train-spectrograms-2/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vishal
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is a dataset of spectrogram images created from the train_spectrograms parquet data from the Harvard Medical School Harmful Brain Activity Classification competition. The parquet files have been transformed with the following code, referencing the HMS-HBAC: KerasCV Starter Notebook

def process_spec(spec_id, split="train"): # read the data data = pd.read_parquet(path/f'{split}_spectrograms'/f'{spec_id}.parquet') # read the label label = unique_df[unique_df.spectrogram_id == spec_id]["target"].item() # replace NA with 0 data = data.fillna(0) # convert DataFrame to array data = data.values[:, 1:] # transpose data = data.T data = data.astype("float32") # clip data to avoid 0s data = np.clip(data, math.exp(-4), math.exp(8)) # take log data to magnify differences data = np.log(data) # normalize data data=(data-data.mean())/data.std() + 1e-6 # convert to 3 channels data = np.tile(data[..., None], (1, 1, 3)) # convert array to PILImage im = PILImage.create(Image.fromarray((data * 255).astype(np.uint8))) im.save(f"{SPEC_DIR}/{split}_spectrograms/{label}/{spec_id}.png")
d
Data from: Attributes for NHDPlus Catchments (Version 1.1) for the...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Normalized Atmospheric Deposition for 2002, Ammonium (NH4) [Dataset]. https://catalog.data.gov/dataset/attributes-for-nhdplus-catchments-version-1-1-for-the-conterminous-united-states-normalize-57d70
Explore at:
Dataset updated
Nov 1, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States, Contiguous United States
Description
This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Ammonium (NH4) for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of NH4 deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.
m
Criteria for evaluating and qualifying public datasets obtained from the...
data.mendeley.com
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gyslla Vasconcelos (2025). Criteria for evaluating and qualifying public datasets obtained from the Brazilian Federal Government's Open Data Portal - dados.gov [Dataset]. http://doi.org/10.17632/x8sgcykthn.2
Explore at:
Unique identifier
https://doi.org/10.17632/x8sgcykthn.2
Dataset updated
May 19, 2025
Authors
Gyslla Vasconcelos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These criteria (file 1) were drawn up empirically, based on the practical challenges faced during the development of the thesis research, based on tests carried out with various datasets applied to process mining tools. These criteria were elaborated empirically, based on the practical challenges faced during the development of the thesis research, based on tests conducted with various datasets applied to process mining tools. These criteria were prepared with the aim of creating a ranking of the datasets selected and published (https://doi.org/10.6084/m9.figshare.25514884.v3), in order to classify them according to their score. The criteria are divided into informative (In), importance (I), difficulty (D) and ease (F) of handling (file 2). The datasets were selected (file 3) and, for ranking, calculations were made (file 5) to normalize the values for standardization (file 4). This data is part of a study on the application of process mining techniques to Brazilian public service data, available on the open data portal dados.gov.
Luecken Cite-seq human bone marrow 2021 preprocessing
figshare.com
hdf
Updated Oct 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Single-cell best practices (2023). Luecken Cite-seq human bone marrow 2021 preprocessing [Dataset]. http://doi.org/10.6084/m9.figshare.23623950.v2
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23623950.v2
Dataset updated
Oct 5, 2023
Dataset provided by
figshare
Authors
Single-cell best practices
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset published by Luecken et al. 2021 which contains data from human bone marrow measured through joint profiling of single-nucleus RNA and Antibody-Derived Tags (ADTs) using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0.File Descriptioncite_quality_control.h5mu: Filtered cell by feature MuData object after quality control.cite_normalization.h5mu: MuData object of normalized data using DSB (denoised and scaled by background) normalization.cite_doublet_removal_xdbt.h5mu: MuData of data after doublet removal based on known cell type markers. Cells were removed if they were double positive for mutually exclusive markers with a DSB value >2.5.cite_dimensionality_reduction.h5mu: MuData of data after dimensionality reduction.cite_batch_correction.h5mu: MuData of data after batch correction.CitationLuecken, M. D. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2021).Original data linkhttps://openproblems.bio/neurips_docs/data/dataset/
d
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...
catalog.data.gov
data.usgs.gov
+2more
Updated Sep 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Normalized Atmospheric Deposition for 2002, Total Inorganic Nitrogen [Dataset]. https://catalog.data.gov/dataset/attributes-for-nhdplus-catchments-version-1-1-for-the-conterminous-united-states-normalize
Explore at:
Dataset updated
Sep 18, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States, Contiguous United States
Description
This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Total Inorganic Nitrogen for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of Total Inorganic Nitrogen deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.
Brain tumor MRI and CT scan
kaggle.com
Updated Oct 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chenghan pu (2022). Brain tumor MRI and CT scan [Dataset]. https://www.kaggle.com/datasets/chenghanpu/brain-tumor-mri-and-ct-scan/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 4, 2022
Dataset provided by
Kaggle
Authors
chenghan pu
Description
A novel brain tumor dataset containing 4500 2D MRI-CT slices. The original MRI and CT scans are also contained in this dataset.

Pre-processing strategy: The pre-processing data pipeline includes pairing MRI and CT scans according to a specific time interval between CT and MRI scans of the same patient, MRI image registration to a standard template, MRI-CT image registration, intensity normalization, and extracting 2D slices from 3D volumes. The pipeline can be used to obtain classic 2D MRI-CT images from 3D Dicom format MRI and CT scans, which can be directly used as the training data for the end-to-end synthetic CT deep learning networks. Detail: Pairing MRI and CT scan: If the time interval between MRI and CT scans is too long, the information in MRI and CT images will not match. Therefore, we pair MRI and CT scans according to a certain time interval between CT and MRI scans of the same patient, which should not exceed half a year. MRI image registration: Considering the differences both in the human brain and space coordinates of radiation images during scanning, the dataset must avoid individual differences and unify the coordinates, which means all the CT and MRI images should be registered to the standard template. The generated images can be more accurate after registration. The template proposed by Montreal Neurosciences Institute is called MNI ICBM 152 non-linear 6th Generation Symmetric Average Brain Stereotaxic Registration Model (MNI 152) (Grabneret al., 2006). Affine registration is first applied to register MRI scans to the MNI152 template. Intensity normalization: The registered scans have some extreme values, which introduce errors that would affect the generation accuracy. We normalize the image data and eliminated these extreme values by selecting the pixel values ranked at the top 1% and bottom 1% and replacing the original pixel values of these pixels with the pixel values of 1% and 99%. Extracting 2D slices from 3D volumes: After carrying out the registration, the 3D MRI and CT scans can be represented as 237×197×189 matrices. To ensure the compatibility between training models and inputs, each 3D image is sliced, and 4500 2D MRI-CT image pairs are selected as the final training data.

Source database: 1. https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=33948305 2. https://wiki.cancerimagingarchive.net/display/Public/CPTAC-GBM 3. https://wiki.cancerimagingarchive.net/display/Public/TCGA-GBM

Patient information: Number of patients: 41

Introduction of each file: Dicom: contains the source file collected from the three websites above. data(processed): contains the processed data which are saved as .npy type. you can use the train_input.npy and train_output.npy as the input and output of the encoder-decoder structure to train the model. Test and Val input and output can be used as test and validation datasets.
J
Identification of parameters in normal error component logit-mixture (NECLM)...
journaldata.zbw.eu
jda-test.zbw.eu
pdf, txt, zip
Updated Dec 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joan L. Walker; Moshe Ben-Akiva; Denis Bolduc; Joan L. Walker; Moshe Ben-Akiva; Denis Bolduc (2022). Identification of parameters in normal error component logit-mixture (NECLM) models (replication data) [Dataset]. http://doi.org/10.15456/jae.2022319.0717541002
Explore at:
zip(162861), zip(100325), txt(952), pdf(22305)Available download formats
Unique identifier
https://doi.org/10.15456/jae.2022319.0717541002
Dataset updated
Dec 8, 2022
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Joan L. Walker; Moshe Ben-Akiva; Denis Bolduc; Joan L. Walker; Moshe Ben-Akiva; Denis Bolduc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Although the basic structure of logit-mixture models is well understood, important identification and normalization issues often get overlooked. This paper addresses issues related to the identification of parameters in logit-mixture models containing normally distributed error components associated with alternatives or nests of alternatives (normal error component logit mixture, or NECLM, models). NECLM models include special cases such as unrestricted, fixed covariance matrices; alternative-specific variances; nesting and cross-nesting structures; and some applications to panel data. A general framework is presented for determining which parameters are identified as well as what normalization to impose when specifying NECLM models. It is generally necessary to specify and estimate NECLM models at the levels, or structural, form. This precludes working with utility differences, which would otherwise greatly simplify the identification and normalization process. Our results show that identification is not always intuitive; for example, normalization issues present in logit-mixture models are not present in analogous probit models. To identify and properly normalize the NECLM, we introduce the equality condition, an addition to the standard order and rank conditions. The identifying conditions are worked through for a number of special cases, and our findings are demonstrated with empirical examples using both synthetic and real data.
m
Open Loop synchronization techniques for distributed energy sources:...
data.mendeley.com
Updated Dec 14, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Safa (2020). Open Loop synchronization techniques for distributed energy sources: Overview, benchmark and comparative analysis Data Set [Dataset]. http://doi.org/10.17632/y2yc5kjmkf.1
Explore at:
Unique identifier
https://doi.org/10.17632/y2yc5kjmkf.1
Dataset updated
Dec 14, 2020
Authors
Ahmed Safa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the raw data of the performances of three open-loop synchronization technique using a proposed benchmark. The RMSE folder contains the data of Table.2 of the article. In the main script, the data has been normalized. The folder contains the original data for each case of the benchmark. The radar chart file contains the original data of the radar chart (Fig .14) . The data has been manipulated to better present it in a radar chart format. First, the values are inverted. Then, we normalize the data according to the highest value. The method that has the highest value (better performances) will take 10, the other methods will be below that value.
CYGNSS Level 1 Science Data Record Version 2.1 - Dataset - NASA Open Data...
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). CYGNSS Level 1 Science Data Record Version 2.1 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/cygnss-level-1-science-data-record-version-2-1-c4d25
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This Level 1 (L1) dataset contains the Version 2.1 geo-located Delay Doppler Maps (DDMs) calibrated into Power Received (Watts) and Bistatic Radar Cross Section (BRCS) expressed in units of meters squared from the Delay Doppler Mapping Instrument aboard the CYGNSS satellite constellation. This version supersedes Version 2.0. Other useful scientific and engineering measurement parameters include the DDM of Normalized Bistatic Radar Cross Section (NBRCS), the Delay Doppler Map Average (DDMA) of the NBRCS near the specular reflection point, and the Leading Edge Slope (LES) of the integrated delay waveform. The L1 dataset contains a number of other engineering and science measurement parameters, including sets of quality flags/indicators, error estimates, and bias estimates as well as a variety of orbital, spacecraft/sensor health, timekeeping, and geolocation parameters. At most, 8 netCDF data files (each file corresponding to a unique spacecraft in the CYGNSS constellation) are provided each day; under nominal conditions, there are typically 6-8 spacecraft retrieving data each day, but this can be maximized to 8 spacecraft under special circumstances in which higher than normal retrieval frequency is needed (i.e., during tropical storms and or hurricanes). Latency is approximately 6 days (or better) from the last recorded measurement time. The Version 2.1 release represents the second science-quality release. Here is a summary of improvements that reflect the quality of the Version 2.1 data release: 1) data is now available when the CYGNSS satellites are rolled away from nadir during orbital high beta-angle periods, resulting in a significant amount of additional data; 2) correction to coordinate frames result in more accurate estimates of receiver antenna gain at the specular point; 3) improved calibration for analog-to-digital conversion results in better consistency between CYGNSS satellites measurements at nearly the same location and time; 4) improved GPS EIRP and transmit antenna pattern calibration results in significantly reduced PRN-dependence in the observables; 5) improved estimation of the location of the specular point within the DDM; 6) an altitude-dependent scattering area is used to normalize the scattering cross section (v2.0 used a simpler scattering area model that varied with incidence and azimuth angles but not altitude); 7) corrections added for noise floor-dependent biases in scattering cross section and leading edge slope of delay waveform observed in the v2.0 data. Users should also note that the receiver antenna pattern calibration is not applied per-DDM-bin in this v2.1 release.
s
Data from: Multimodal classification of molecular subtypes in pediatric...
figshare.scilifelab.se
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olga Krali; Yanara Marincevic-Zuniga; Gustav Arvidsson; Anna Pia Enblad; Anders Lundmark; Shumaila Sayyab; Vasilios Zachariadis; Merja Heinäniemi; Janne Suhonen; Laura Oksa; Kaisa Vepsäläinen; Ingegerd Öfverholm; Gisela Barbany; Ann Nordgren; Henrik Lilljebjörn; Thoas Fioretos; Hans O. Madsen; Hanne Vibeke Marquart; Trond Flaegstad; Erik Forestier; Ólafur G. Jónsson; Jukka Kanerva; Olli Lohi; Ulrika Norén-Nyström; Kjeld Schmiegelow; Arja Harila; Mats Heyman; Gudmar Lönnerholm; Ann-Christine Syvänen; Jessica Nordlund (2025). Multimodal classification of molecular subtypes in pediatric acute lymphoblastic leukemia [Dataset]. http://doi.org/10.17044/scilifelab.22303531.v3
Explore at:
Unique identifier
https://doi.org/10.17044/scilifelab.22303531.v3
Dataset updated
Jan 15, 2025
Dataset provided by
Uppsala University
Authors
Olga Krali; Yanara Marincevic-Zuniga; Gustav Arvidsson; Anna Pia Enblad; Anders Lundmark; Shumaila Sayyab; Vasilios Zachariadis; Merja Heinäniemi; Janne Suhonen; Laura Oksa; Kaisa Vepsäläinen; Ingegerd Öfverholm; Gisela Barbany; Ann Nordgren; Henrik Lilljebjörn; Thoas Fioretos; Hans O. Madsen; Hanne Vibeke Marquart; Trond Flaegstad; Erik Forestier; Ólafur G. Jónsson; Jukka Kanerva; Olli Lohi; Ulrika Norén-Nyström; Kjeld Schmiegelow; Arja Harila; Mats Heyman; Gudmar Lönnerholm; Ann-Christine Syvänen; Jessica Nordlund
License
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Description
This dataset contains genome-wide DNA methylation data generated from 384 pediatric acute lymphoblastic leukemia (ALL) samples originating from bone marrow or peripheral blood samples taken at ALL diagnosis (n = 384). Further details regarding the samples are available in Supplementary Table S2 from Krali et al., 2023 (https://doi.org/10.1038/s41698-023-00479-5).Genome-wide DNA methylation was analyzed at the SNP&SEQ Technology Platform, SciLifeLab, National Genomics Infrastructure Uppsala, Sweden. 250 ng of bisulfite converted DNA was amplified, fragmented and hybridised to Illumina Infinium Human Methylation450k Beadchip using the standard protocol from Illumina (iScan SQ instrument).This metadata record contains information about the raw idat files generated from the Infinium DNA methylation arrays. The raw idat files were processed with Methylation Module (1.8.5) software in Genome Studio (V2010.3). Peak-based correction was used to normalize the beta-value matrix.The raw idat files along with a samplesheet, processed beta-value matrix, annotation file for CpG annotation will be made available upon request. Limited phenotype information is available in the Supplemental Table S2 of the manuscript. All scripts that give a walk-through to our project, including the modelling process with Machine Learning can be found in our GitHub repository.Terms for accessThe DNA methylation dataset is only to be used for research that is seeking to advance the understanding of the influence of epigenetic factors on leukemia etiology and biology.The data should not be used for other purposes, i.e. investigating the epigenetic signatures that may lead to identification of a person.For retrieving the data used for the scope of this publication, please contact datacentre@scilifelab.se.
d
Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United States: Normalized Atmospheric Deposition for 2002, Nitrate (NO3) [Dataset]. https://catalog.data.gov/dataset/attributes-for-nhdplus-catchments-version-1-1-for-the-conterminous-united-states-normalize-79a86
Explore at:
Dataset updated
Nov 28, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States, Contiguous United States
Description
This data set represents the average normalized atmospheric (wet) deposition, in kilograms, of Nitrate (NO3) for the year 2002 compiled for every catchment of NHDPlus for the conterminous United States. Estimates of NO3 deposition are based on National Atmospheric Deposition Program (NADP) measurements (B. Larsen, U.S. Geological Survey, written commun., 2007). De-trending methods applied to the year 2002 are described in Alexander and others, 2001. NADP site selection met the following criteria: stations must have records from 1995 to 2002 and have a minimum of 30 observations. The NHDPlus Version 1.1 is an integrated suite of application-ready geospatial datasets that incorporates many of the best features of the National Hydrography Dataset (NHD) and the National Elevation Dataset (NED). The NHDPlus includes a stream network (based on the 1:100,00-scale NHD), improved networking, naming, and value-added attributes (VAAs). NHDPlus also includes elevation-derived catchments (drainage areas) produced using a drainage enforcement technique first widely used in New England, and thus referred to as "the New England Method." This technique involves "burning in" the 1:100,000-scale NHD and when available building "walls" using the National Watershed Boundary Dataset (WBD). The resulting modified digital elevation model (HydroDEM) is used to produce hydrologic derivatives that agree with the NHD and WBD. Over the past two years, an interdisciplinary team from the U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency (USEPA), and contractors, found that this method produces the best quality NHD catchments using an automated process (USEPA, 2007). The NHDPlus dataset is organized by 18 Production Units that cover the conterminous United States. The NHDPlus version 1.1 data are grouped by the U.S. Geologic Survey's Major River Basins (MRBs, Crawford and others, 2006). MRB1, covering the New England and Mid-Atlantic River basins, contains NHDPlus Production Units 1 and 2. MRB2, covering the South Atlantic-Gulf and Tennessee River basins, contains NHDPlus Production Units 3 and 6. MRB3, covering the Great Lakes, Ohio, Upper Mississippi, and Souris-Red-Rainy River basins, contains NHDPlus Production Units 4, 5, 7 and 9. MRB4, covering the Missouri River basins, contains NHDPlus Production Units 10-lower and 10-upper. MRB5, covering the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf River basins, contains NHDPlus Production Units 8, 11 and 12. MRB6, covering the Rio Grande, Colorado and Great Basin River basins, contains NHDPlus Production Units 13, 14, 15 and 16. MRB7, covering the Pacific Northwest River basins, contains NHDPlus Production Unit 17. MRB8, covering California River basins, contains NHDPlus Production Unit 18.
Global heat map of probable importance of terrestrial ecosystems on meeting...
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kashif Shaad; Kashif Shaad (2020). Global heat map of probable importance of terrestrial ecosystems on meeting local demand of freshwater services [Dataset]. http://doi.org/10.5281/zenodo.3360641
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3360641
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kashif Shaad; Kashif Shaad
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This map (raster dataset, single layer) uses existing datasets to map globally “How important point x is likely to be for meeting the demand of a reliable & useable source of water on a scale of 0 to 1?” This relatively simple approach uses estimated water demand in a given basin as weight to identify pressure for flow regulation and water provisioning services. Precipitation and land cover estimates are then combined with it to give some insight into the hydrologic attributes of “location” and “timing” of flow that the ecosystems may influence. The underlying assumption here is that undisturbed ecosystems everywhere are performing the ecohydrological functions leading to freshwater services. The question is more (at the global scale): how dependent are the populations in the basin on the continued functioning of these services.

Input datasets:

Annual surface & groundwater (“blue”) water consumption estimates. URL: http://waterfootprint.org/en/resources/water-footprint-statistics/

HydroBasins watershed outline.

European Space Agency (ESA) global land cover 2015.

WorldClim annual average precipitation (Version 2.0).

Process:

Step 1: Calculate average annual water consumption estimates over HydroBasin outlines. This step spreads the demand laterally (in case of small basins) and upstream to the headwaters from (typically) downstream consumer concentration.

Step 2: Normalize the demand globally and map the normalized values on to “natural” land cover classes from the land cover dataset [forests, grasslands, etc].

Step 3: Normalize annual precipitation layer within basins on the scale 0-1 where 1 is the maximum annual precipitation in that basin. This is also mapped on the “natural” land cover. Precipitation is thus acting as ‘weight’ for importance within the basin. Example, upland headwaters will typically receive more rainfall and can be argued to be important for the flow regulation in the basin.

Step 4: Combine the layers from 2 and 3.

Caveats:

Identification of what constitutes a “natural” land cover is not trivial, especially from global land cover maps. Example: Forests and plantations are hard to distinguish from these products.

Improvement of quality of water is assumed to be implicit for functioning ecosystems.
Chocolate Sales Data
kaggle.com
zip
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milos Zubac (2025). Chocolate Sales Data [Dataset]. https://www.kaggle.com/datasets/miloszubac/chocolate-sales-data
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 19, 2025
Authors
Milos Zubac
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset uses the "Chocolate Sales Data" dataset, but I have also added a nutrition dataset that I randomly generated.

The sales_data dataset contains chocolate sales data from six different countries, covering the period from January 2022 to August 2022. The nutrition dataset includes seven different nutritional components; however, the serving sizes are not standardized. If you wish to compare nutritional values, you will need to normalize them first.

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2013). ‘The Bronson Files, Dataset 4, Field 105, 2013’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-the-bronson-files-dataset-4-field-105-2013-7c96/latest

‘The Bronson Files, Dataset 4, Field 105, 2013’ analyzed by Analyst-2

Explore at:

Dataset updated

Aug 1, 2013

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘The Bronson Files, Dataset 4, Field 105, 2013’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/392f69f2-aa43-4e90-970d-33c36e011c19 on 11 February 2022.

--- Dataset description provided by original source is as follows ---

Dr. Kevin Bronson provides this unique nitrogen and water management in wheat agricultural research dataset for compute. Ten irrigation treatments from a linear sprinkler were combined with nitrogen treatments. This dataset includes notation of field events and operations, an intermediate analysis mega-table of correlated and calculated parameters, including laboratory analysis results generated during the experimentation, plus high resolution plot level intermediate data tables of SAS process output, as well as the complete raw sensors records and logger outputs.

This data was collected during the beginning time period of our USDA Maricopa terrestrial proximal high-throughput plant phenotyping tri-metric method generation, where a 5Hz crop canopy height, temperature and spectral signature are recorded coincident to indicate a plant health status. In this early development period, our Proximal Sensing Cart Mark1 (PSCM1) platform supplants people carrying the CropCircle (CC) sensors, and with an improved view mechanical performance result.

Experimental design and operational details of research conducted are contained in related published articles, however further description of the measured data signals as well as germane commentary is herein offered.

The primary component of this dataset is the Holland Scientific (HS) CropCircle ACS-470 reflectance numbers. Which as derived here, consist of raw active optical band-pass values, digitized onboard the sensor product. Data is delivered as sequential serialized text output including the associated GPS information. Typically this is a production agriculture support technology, enabling an efficient precision application of nitrogen fertilizer. We used this optical reflectance sensor technology to investigate plant agronomic biology, as the ACS-470 is a unique performance product being not only rugged and reliable but illumination active and filter customizable.

Individualized ACS-470 sensor detector behavior and subsequent index calculation influence can be understood through analysis of white-panel and other known target measurements. When a sensor is held 120cm from a titanium dioxide white painted panel, a normalized unity value of 1.0 is set for each detector. To generate this dataset we used a Holland Scientific SC-1 device and set the 1.0 unity value (field normalize) on each sensor individually, before each data collection, and without using any channel gain boost. The SC-1 field normalization device allows a communications connection to a Windows machine, where company provided sensor control software enables the necessary sensor normalization routine, and a real-time view of streaming sensor data.

This type of active proximal multi-spectral reflectance data may be perceived as inherently “noisy”; however basic analytical description consistently resolves a biological patterning, and more advanced statistical analysis is suggested to achieve discovery. Sources of polychromatic reflectance are inherent in the environment; and can be influenced by surface features like wax or water, or presence of crystal mineralization; varying bi-directional reflectance in the proximal space is a model reality, and directed energy emission reflection sampling is expected to support physical understanding of the underling passive environmental system.

Soil in view of the sensor does decrease the raw detection amplitude of the target color returned and can add a soil reflection signal component. Yet that return accurately represents a largely two-dimensional cover and intensity signal of the target material present within each view. It does however, not represent a reflection of the plant material solely because it can contain additional features in view. Expect NDVI values greater than 0.1 when sensing plants and saturating more around 0.8, rather than the typical 0.9 of passive NDVI.

The active signal does not transmit energy to penetrate, perhaps past LAI 2.1 or less, compared to what a solar induced passive reflectance sensor would encounter. However the focus of our active sensor scan is on the uppermost expanded canopy leaves, and they are positioned to intercept the major solar energy. Active energy sensors are more easy to direct, and in our capture method we target a consistent sensor height that is 1m above the average canopy height, and maintaining a rig travel speed target around 1.5 mph, with sensors parallel to earth ground in a nadir view.

We consider these CropCircle raw detector returns to be more “instant” in generation, and “less-filtered” electronically, while onboard the “black-box” device, than are other reflectance products which produce vegetation indices as averages of multiple detector samples in time.

It is known through internal sensor performance tracking across our entire location inventory, that sensor body temperature change affects sensor raw detector returns in minor and undescribed yet apparently consistent ways.

Holland Scientific 5Hz CropCircle active optical reflectance ACS-470 sensors, that were measured on the GeoScout digital propriety serial data logger, have a stable output format as defined by firmware version.

Different numbers of csv data files were generated based on field operations, and there were a few short duration instances where GPS signal was lost, multiple raw data files when present, including white panel measurements before or after field collections, were combined into one file, with the inclusion of the null value placeholder -9999. Two CropCircle sensors, numbered 2 and 3, were used supplying data in a lined format, where variables are repeated for each sensor, creating a discrete data row for each individual sensor measurement instance.

We offer six high-throughput single pixel spectral colors, recorded at 530, 590, 670, 730, 780, and 800nm. The filtered band-pass was 10nm, except for the NIR, which was set to 20 and supplied an increased signal (including increased noise).

Dual, or tandem, CropCircle sensor paired usage empowers additional vegetation index calculations such as:
DATT = (r800-r730)/(r800-r670)
DATTA = (r800-r730)/(r800-r590)
MTCI = (r800-r730)/(r730-r670)
CIRE = (r800/r730)-1
CI = (r800/r590)-1
CCCI = NDRE/NDVIR800
PRI = (r590-r530)/(r590+r530)
CI800 = ((r800/r590)-1)
CI780 = ((r780/r590)-1)

The Campbell Scientific (CS) environmental data recording of small range (0 to 5 v) voltage sensor signals are accurate and largely shielded from electronic thermal induced influence, or other such factors by design. They were used as was descriptively recommended by the company. A high precision clock timing, and a recorded confluence of custom metrics, allow the Campbell Scientific raw data signal acquisitions a high research value generally, and have delivered baseline metrics in our plant phenotyping program. Raw electrical sensor signal captures were recorded at the maximum digital resolution, and could be re-processed in whole, while the subsequent onboard calculated metrics were often data typed at a lower memory precision and served our research analysis.

Improved Campbell Scientific data at 5Hz is presented for nine collection events, where thermal, ultrasonic displacement, and additional GPS metrics were recorded. Ultrasonic height metrics generated by the Honeywell sensor and present in this dataset, represent successful phenotypic recordings. The Honeywell ultrasonic displacement sensor has worked well in this application because of its 180Khz signal frequency that ranges 2m space. Air temperature is still a developing metric, a thermocouple wire junction (TC) placed in free air with a solar shade produced a low-confidence passive ambient air temperature.

Campbell Scientific logger derived data output is structured in a column format, with multiple sensor data values present in each data row. One data row represents one program output cycle recording across the sensing array, as there was no onboard logger data averaging or down sampling. Campbell Scientific data is first recorded in binary format onboard the data logger, and then upon data retrieval, converted to ASCII text via the PC based LoggerNet CardConvert application. Here, our full CS raw data output, that includes a four-line header structure, was truncated to a typical single row header of variable names. The -9999 placeholder value was inserted for null instances.

There is canopy thermal data from three view vantages. A nadir sensor view, and looking forward and backward down the plant row at a 30 degree angle off nadir. The high confidence Apogee Instruments SI-111 type infrared radiometer, non-contact thermometer, serial number 1052 was in a front position looking forward away from the platform, number 1023 with a nadir view was in middle position, and sensor number 1022 was in a rear position and looking back toward the platform frame, until after 4/10/2013 when the order was reversed. We have a long and successful history testing and benchmarking performance, and deploying Apogee Instruments infrared radiometers in field experimentation. They are biologically spectral window relevant sensors and return a fast update 0.2C accurate average surface temperature, derived from what is (geometrically weighted) in their field of view.

Data gaps do exist beyond null value -9999 designations, there are some instances when GPS signal was lost, or rarely on HS GeoScout logger error. GPS information may be missing at the start of data recording.

Clear search

Close search

Google apps

Main menu

‘The Bronson Files, Dataset 4, Field 105, 2013’ analyzed by Analyst-2

Identification of Novel Reference Genes Suitable for qRT-PCR Normalization...

Data from: The Bronson Files, Dataset 6, Field 13, 2014

Lego Road image for Drone Dataset

Residential Existing Homes (One to Four Units) Energy Efficiency Meter...

LEMming: A Linear Error Model to Normalize Parallel Quantitative Real-Time...

County Hurr Risk

HMS HBAC Train Spectrograms 2

Data from: Attributes for NHDPlus Catchments (Version 1.1) for the...

Criteria for evaluating and qualifying public datasets obtained from the...

Luecken Cite-seq human bone marrow 2021 preprocessing

Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...

Brain tumor MRI and CT scan

Identification of parameters in normal error component logit-mixture (NECLM)...

Open Loop synchronization techniques for distributed energy sources:...

CYGNSS Level 1 Science Data Record Version 2.1 - Dataset - NASA Open Data...

Data from: Multimodal classification of molecular subtypes in pediatric...

Attributes for NHDPlus Catchments (Version 1.1) for the Conterminous United...

Global heat map of probable importance of terrestrial ecosystems on meeting...

Chocolate Sales Data

‘The Bronson Files, Dataset 4, Field 105, 2013’ analyzed by Analyst-2