20 datasets found
  1. c

    Data from: LVMED: Dataset of Latvian text normalisation samples for the...

    • repository.clarin.lv
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Viesturs Jūlijs Lasmanis; Normunds Grūzītis (2023). LVMED: Dataset of Latvian text normalisation samples for the medical domain [Dataset]. https://repository.clarin.lv/repository/xmlui/handle/20.500.12574/85
    Explore at:
    Dataset updated
    May 30, 2023
    Authors
    Viesturs Jūlijs Lasmanis; Normunds Grūzītis
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms).

    Training dataset: 64,665 sentence pairs Validation dataset: 7,185 sentence pairs. Testing dataset: 7,984 sentence pairs.

    All sentences are extracted from a public web corpus (https://korpuss.lv/id/Tīmeklis2020) and contain at least one medical term.

  2. Luecken Cite-seq human bone marrow 2021 preprocessing

    • figshare.com
    hdf
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Single-cell best practices (2023). Luecken Cite-seq human bone marrow 2021 preprocessing [Dataset]. http://doi.org/10.6084/m9.figshare.23623950.v2
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Single-cell best practices
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset published by Luecken et al. 2021 which contains data from human bone marrow measured through joint profiling of single-nucleus RNA and Antibody-Derived Tags (ADTs) using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0.File Descriptioncite_quality_control.h5mu: Filtered cell by feature MuData object after quality control.cite_normalization.h5mu: MuData object of normalized data using DSB (denoised and scaled by background) normalization.cite_doublet_removal_xdbt.h5mu: MuData of data after doublet removal based on known cell type markers. Cells were removed if they were double positive for mutually exclusive markers with a DSB value >2.5.cite_dimensionality_reduction.h5mu: MuData of data after dimensionality reduction.cite_batch_correction.h5mu: MuData of data after batch correction.CitationLuecken, M. D. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2021).Original data linkhttps://openproblems.bio/neurips_docs/data/dataset/

  3. A

    Data from: The Bronson Files, Dataset 5, Field 105, 2014

    • data.amerigeoss.org
    csv, jpeg, pdf, qt +2
    Updated Aug 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2022). The Bronson Files, Dataset 5, Field 105, 2014 [Dataset]. https://data.amerigeoss.org/dataset/the-bronson-files-dataset-5-field-105-2014-14f0b
    Explore at:
    csv, zip, pdf, xls, qt, jpegAvailable download formats
    Dataset updated
    Aug 24, 2022
    Dataset provided by
    United States
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Dr. Kevin Bronson provides a second year of nitrogen and water management in wheat agricultural research dataset for compute. Ten irrigation treatments from a linear sprinkler were combined with nitrogen treatments. This dataset includes notation of field events and operations, an intermediate analysis mega-table of correlated and calculated parameters, including laboratory analysis results generated during the experimentation, plus high resolution plot level intermediate data tables of SAS process output, as well as the complete raw data sensor records and logger outputs.

    This proximal terrestrial high-throughput plant phenotyping data examples our early tri-metric field method, where a geo-referenced 5Hz crop canopy height, temperature and spectral signature are recorded coincident to indicate a plant health status. In this development period, our Proximal Sensing Cart Mark1 (PSCM1) platform suspends a single cluster of sensors on a dual sliding vertical placement armature.

    Experimental design and operational details of research conducted are contained in related published articles, however further description of the measured data signals as well as germane commentary is herein offered.

    The primary component of this dataset is the Holland Scientific (HS) CropCircle ACS-470 reflectance numbers. Which as derived here, consist of raw active optical band-pass values, digitized onboard the sensor product. Data is delivered as sequential serialized text output including the associated GPS information. Typically this is a production agriculture support technology, enabling an efficient precision application of nitrogen fertilizer. We used this optical reflectance sensor technology to investigate plant agronomic biology, as the ACS-470 is a unique performance product being not only rugged and reliable but illumination active and filter customizable.

    Individualized ACS-470 sensor detector behavior and subsequent index calculation influence can be understood through analysis of white-panel and other known target measurements. When a sensor is held 120cm from a titanium dioxide white painted panel, a normalized unity value of 1.0 is set for each detector. To generate this dataset we used a Holland Scientific SC-1 device and set the 1.0 unity value (field normalize) on each sensor individually, before each data collection, and without using any channel gain boost. The SC-1 field normalization device allows a communications connection to a Windows machine, where company provided sensor control software enables the necessary sensor normalization routine, and a real-time view of streaming sensor data.

    This type of active proximal multi-spectral reflectance data may be perceived as inherently “noisy”; however basic analytical description consistently resolves a biological patterning, and more advanced statistical analysis is suggested to achieve discovery. Sources of polychromatic reflectance are inherent in the environment; and can be influenced by surface features like wax or water, or presence of crystal mineralization; varying bi-directional reflectance in the proximal space is a model reality, and directed energy emission reflection sampling is expected to support physical understanding of the underling passive environmental system.

    Soil in view of the sensor does decrease the raw detection amplitude of the target color returned and can add a soil reflection signal component. Yet that return accurately represents a largely two-dimensional cover and intensity signal of the target material present within each view. It does however, not represent a reflection of the plant material solely because it can contain additional features in view. Expect NDVI values greater than 0.1 when sensing plants and saturating more around 0.8, rather than the typical 0.9 of passive NDVI.

    The active signal does not transmit energy to penetrate, perhaps past LAI 2.1 or less, compared to what a solar induced passive reflectance sensor would encounter. However the focus of our active sensor scan is on the uppermost expanded canopy leaves, and they are positioned to intercept the major solar energy. Active energy sensors are more easy to direct, and in our capture method we target a consistent sensor height that is 1m above the average canopy height, and maintaining a rig travel speed target around 1.5 mph, with sensors parallel to earth ground in a nadir view.

    We consider these CropCircle raw detector returns to be more “instant” in generation, and “less-filtered” electronically, while onboard the “black-box” device, than are other reflectance products which produce vegetation indices as averages of multiple detector samples in time.

    It is known through internal sensor performance tracking across our entire location inventory, that sensor body temperature change affects sensor raw detector returns in minor and undescribed yet apparently consistent ways.

    Holland Scientific 5Hz CropCircle active optical reflectance ACS-470 sensors, that were measured on the GeoScout digital propriety serial data logger, have a stable output format as defined by firmware version. Fifteen collection events are presented.

    Different numbers of csv data files were generated based on field operations, and there were a few short duration instances where GPS signal was lost. Multiple raw data files when present, including white panel measurements before or after field collections, were combined into one file, with the inclusion of the null value placeholder -9999. Two CropCircle sensors, numbered 2 and 3, were used, supplying data in a lined format, where variables are repeated for each sensor. This created a discrete data row for each individual sensor measurement instance.

    We offer six high-throughput single pixel spectral colors, recorded at 530, 590, 670, 730, 780, and 800nm. The filtered band-pass was 10nm, except for the NIR, which was set to 20 and supplied an increased signal (including an increased noise).

    Dual, or tandem approach, CropCircle paired sensor usage empowers additional vegetation index calculations, such as:
    DATT = (r800-r730)/(r800-r670)
    DATTA = (r800-r730)/(r800-r590)
    MTCI = (r800-r730)/(r730-r670)
    CIRE = (r800/r730)-1
    CI = (r800/r590)-1
    CCCI = NDRE/NDVIR800
    PRI = (r590-r530)/(r590+r530)
    CI800 = ((r800/r590)-1)
    CI780 = ((r780/r590)-1)

    The Campbell Scientific (CS) environmental data recording of small range (0 to 5 v) voltage sensor signals are accurate and largely shielded from electronic thermal induced influence, or other such factors by design. They were used as was descriptively recommended by the company. A high precision clock timing, and a recorded confluence of custom metrics, allow the Campbell Scientific raw data signal acquisitions a high research value generally, and have delivered baseline metrics in our plant phenotyping program. Raw electrical sensor signal captures were recorded at the maximum digital resolution, and could be re-processed in whole, while the subsequent onboard calculated metrics were often data typed at a lower memory precision and served our research analysis.

    Improved Campbell Scientific data at 5Hz is presented for nine collection events, where thermal, ultrasonic displacement, and additional GPS metrics were recorded. Ultrasonic height metrics generated by the Honeywell sensor and present in this dataset, represent successful phenotypic recordings. The Honeywell ultrasonic displacement sensor has worked well in this application because of its 180Khz signal frequency that ranges 2m space. Air temperature is still a developing metric, a thermocouple wire junction (TC) placed in free air with a solar shade produced a low-confidence passive ambient air temperature.

    Campbell Scientific logger derived data output is structured in a column format, with multiple sensor data values present in each data row. One data row represents one program output cycle recording across the sensing array, as there was no onboard logger data averaging or down sampling. Campbell Scientific data is first recorded in binary format onboard the data logger, and then upon data retrieval, converted to ASCII text via the PC based LoggerNet CardConvert application. Here, our full CS raw data output, that includes a four-line header structure, was truncated to a typical single row header of variable names. The -9999 placeholder value was inserted for null instances.

    There is canopy thermal data from three view vantages. A nadir sensor view, and looking forward and backward down the plant row at a 30 degree angle off nadir. The high confidence Apogee Instruments SI-111 type infrared radiometer, non-contact thermometer, serial number 1022 was in a front position looking forward away from the platform, number 1023 with a nadir view was in middle position, and sensor number 1052 was in a rear position and looking back toward the platform frame. We have a long and successful history testing and benchmarking performance, and deploying Apogee Instruments infrared radiometers in field experimentation. They are biologically spectral window relevant sensors and return a fast update 0.2C accurate average surface temperature, derived from what is (geometrically weighted) in their field of view.

    Data gaps do exist beyond null value -9999 designations, there are some instances when GPS signal was lost, or rarely on HS GeoScout logger error. GPS information may be missing at the start of data recording. However once the receiver supplies a signal the values will populate. Likewise there may be missing information at the end of a data collection, where the GPS signal was lost but sensors continue to record along with the data logger timestamping.

    In the raw CS data, collections 1 through 7 are represented by only one table file, where the UTC from the GPS

  4. f

    Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq...

    • figshare.com
    • frontiersin.figshare.com
    xlsx
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.XLSX [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

  5. Benchmark data set for MSPypeline, a python package for streamlined mass...

    • data.niaid.nih.gov
    xml
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Held; Ursula Klingmüller (2021). Benchmark data set for MSPypeline, a python package for streamlined mass spectrometry-based proteomics data analysis [Dataset]. https://data.niaid.nih.gov/resources?id=pxd025792
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Jul 22, 2021
    Dataset provided by
    Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
    DKFZ Heidelberg
    Authors
    Alexander Held; Ursula Klingmüller
    Variables measured
    Proteomics
    Description

    Mass spectrometry-based proteomics is increasingly employed in biology and medicine. To generate reliable information from large data sets and ensure comparability of results it is crucial to implement and standardize the quality control of the raw data, the data processing steps and the statistical analyses. The MSPypeline provides a platform for the import of MaxQuant output tables, the generation of quality control reports, the preprocessing of data including normalization and exploratory analyses by statistical inference plots. These standardized steps assess data quality, provide customizable figures and enable the identification of differentially expressed proteins to reach biologically relevant conclusions.

  6. A

    ‘The Bronson Files, Dataset 4, Field 105, 2013’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 1, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2013). ‘The Bronson Files, Dataset 4, Field 105, 2013’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-the-bronson-files-dataset-4-field-105-2013-7c96/e98343bf/?iid=003-110&v=presentation
    Explore at:
    Dataset updated
    Aug 1, 2013
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘The Bronson Files, Dataset 4, Field 105, 2013’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/392f69f2-aa43-4e90-970d-33c36e011c19 on 11 February 2022.

    --- Dataset description provided by original source is as follows ---

    Dr. Kevin Bronson provides this unique nitrogen and water management in wheat agricultural research dataset for compute. Ten irrigation treatments from a linear sprinkler were combined with nitrogen treatments. This dataset includes notation of field events and operations, an intermediate analysis mega-table of correlated and calculated parameters, including laboratory analysis results generated during the experimentation, plus high resolution plot level intermediate data tables of SAS process output, as well as the complete raw sensors records and logger outputs.

    This data was collected during the beginning time period of our USDA Maricopa terrestrial proximal high-throughput plant phenotyping tri-metric method generation, where a 5Hz crop canopy height, temperature and spectral signature are recorded coincident to indicate a plant health status. In this early development period, our Proximal Sensing Cart Mark1 (PSCM1) platform supplants people carrying the CropCircle (CC) sensors, and with an improved view mechanical performance result.

    Experimental design and operational details of research conducted are contained in related published articles, however further description of the measured data signals as well as germane commentary is herein offered.

    The primary component of this dataset is the Holland Scientific (HS) CropCircle ACS-470 reflectance numbers. Which as derived here, consist of raw active optical band-pass values, digitized onboard the sensor product. Data is delivered as sequential serialized text output including the associated GPS information. Typically this is a production agriculture support technology, enabling an efficient precision application of nitrogen fertilizer. We used this optical reflectance sensor technology to investigate plant agronomic biology, as the ACS-470 is a unique performance product being not only rugged and reliable but illumination active and filter customizable.

    Individualized ACS-470 sensor detector behavior and subsequent index calculation influence can be understood through analysis of white-panel and other known target measurements. When a sensor is held 120cm from a titanium dioxide white painted panel, a normalized unity value of 1.0 is set for each detector. To generate this dataset we used a Holland Scientific SC-1 device and set the 1.0 unity value (field normalize) on each sensor individually, before each data collection, and without using any channel gain boost. The SC-1 field normalization device allows a communications connection to a Windows machine, where company provided sensor control software enables the necessary sensor normalization routine, and a real-time view of streaming sensor data.

    This type of active proximal multi-spectral reflectance data may be perceived as inherently “noisy”; however basic analytical description consistently resolves a biological patterning, and more advanced statistical analysis is suggested to achieve discovery. Sources of polychromatic reflectance are inherent in the environment; and can be influenced by surface features like wax or water, or presence of crystal mineralization; varying bi-directional reflectance in the proximal space is a model reality, and directed energy emission reflection sampling is expected to support physical understanding of the underling passive environmental system.

    Soil in view of the sensor does decrease the raw detection amplitude of the target color returned and can add a soil reflection signal component. Yet that return accurately represents a largely two-dimensional cover and intensity signal of the target material present within each view. It does however, not represent a reflection of the plant material solely because it can contain additional features in view. Expect NDVI values greater than 0.1 when sensing plants and saturating more around 0.8, rather than the typical 0.9 of passive NDVI.

    The active signal does not transmit energy to penetrate, perhaps past LAI 2.1 or less, compared to what a solar induced passive reflectance sensor would encounter. However the focus of our active sensor scan is on the uppermost expanded canopy leaves, and they are positioned to intercept the major solar energy. Active energy sensors are more easy to direct, and in our capture method we target a consistent sensor height that is 1m above the average canopy height, and maintaining a rig travel speed target around 1.5 mph, with sensors parallel to earth ground in a nadir view.

    We consider these CropCircle raw detector returns to be more “instant” in generation, and “less-filtered” electronically, while onboard the “black-box” device, than are other reflectance products which produce vegetation indices as averages of multiple detector samples in time.

    It is known through internal sensor performance tracking across our entire location inventory, that sensor body temperature change affects sensor raw detector returns in minor and undescribed yet apparently consistent ways.

    Holland Scientific 5Hz CropCircle active optical reflectance ACS-470 sensors, that were measured on the GeoScout digital propriety serial data logger, have a stable output format as defined by firmware version.

    Different numbers of csv data files were generated based on field operations, and there were a few short duration instances where GPS signal was lost, multiple raw data files when present, including white panel measurements before or after field collections, were combined into one file, with the inclusion of the null value placeholder -9999. Two CropCircle sensors, numbered 2 and 3, were used supplying data in a lined format, where variables are repeated for each sensor, creating a discrete data row for each individual sensor measurement instance.

    We offer six high-throughput single pixel spectral colors, recorded at 530, 590, 670, 730, 780, and 800nm. The filtered band-pass was 10nm, except for the NIR, which was set to 20 and supplied an increased signal (including increased noise).

    Dual, or tandem, CropCircle sensor paired usage empowers additional vegetation index calculations such as:
    DATT = (r800-r730)/(r800-r670)
    DATTA = (r800-r730)/(r800-r590)
    MTCI = (r800-r730)/(r730-r670)
    CIRE = (r800/r730)-1
    CI = (r800/r590)-1
    CCCI = NDRE/NDVIR800
    PRI = (r590-r530)/(r590+r530)
    CI800 = ((r800/r590)-1)
    CI780 = ((r780/r590)-1)

    The Campbell Scientific (CS) environmental data recording of small range (0 to 5 v) voltage sensor signals are accurate and largely shielded from electronic thermal induced influence, or other such factors by design. They were used as was descriptively recommended by the company. A high precision clock timing, and a recorded confluence of custom metrics, allow the Campbell Scientific raw data signal acquisitions a high research value generally, and have delivered baseline metrics in our plant phenotyping program. Raw electrical sensor signal captures were recorded at the maximum digital resolution, and could be re-processed in whole, while the subsequent onboard calculated metrics were often data typed at a lower memory precision and served our research analysis.

    Improved Campbell Scientific data at 5Hz is presented for nine collection events, where thermal, ultrasonic displacement, and additional GPS metrics were recorded. Ultrasonic height metrics generated by the Honeywell sensor and present in this dataset, represent successful phenotypic recordings. The Honeywell ultrasonic displacement sensor has worked well in this application because of its 180Khz signal frequency that ranges 2m space. Air temperature is still a developing metric, a thermocouple wire junction (TC) placed in free air with a solar shade produced a low-confidence passive ambient air temperature.

    Campbell Scientific logger derived data output is structured in a column format, with multiple sensor data values present in each data row. One data row represents one program output cycle recording across the sensing array, as there was no onboard logger data averaging or down sampling. Campbell Scientific data is first recorded in binary format onboard the data logger, and then upon data retrieval, converted to ASCII text via the PC based LoggerNet CardConvert application. Here, our full CS raw data output, that includes a four-line header structure, was truncated to a typical single row header of variable names. The -9999 placeholder value was inserted for null instances.

    There is canopy thermal data from three view vantages. A nadir sensor view, and looking forward and backward down the plant row at a 30 degree angle off nadir. The high confidence Apogee Instruments SI-111 type infrared radiometer, non-contact thermometer, serial number 1052 was in a front position looking forward away from the platform, number 1023 with a nadir view was in middle position, and sensor number 1022 was in a rear position and looking back toward the platform frame, until after 4/10/2013 when the order was reversed. We have a long and successful history testing and benchmarking performance, and deploying Apogee Instruments infrared radiometers in field experimentation. They are biologically spectral window relevant sensors and return a fast update 0.2C accurate average surface temperature, derived from what is (geometrically weighted) in their field of view.

    Data gaps do exist beyond null value -9999 designations, there are some instances when GPS signal was lost, or rarely on HS GeoScout logger error. GPS information may be missing at the start of data recording.

  7. f

    Data from: Development of a standard operating procedure for the DCFH2-DA...

    • tandf.figshare.com
    docx
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Boyles; Fiona Murphy; William Mueller; Wendel Wohlleben; Nicklas Raun Jacobsen; Hedwig Braakhuis; Anna Giusti; Vicki Stone (2023). Development of a standard operating procedure for the DCFH2-DA acellular assessment of reactive oxygen species produced by nanomaterials [Dataset]. http://doi.org/10.6084/m9.figshare.19085883.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Matthew Boyles; Fiona Murphy; William Mueller; Wendel Wohlleben; Nicklas Raun Jacobsen; Hedwig Braakhuis; Anna Giusti; Vicki Stone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Improved strategies are required for testing nanomaterials (NMs) to make hazard and risk assessment more efficient and sustainable. Including reduced reliance on animal models, without decreasing the level of human health protection. Acellular detection of reactive oxygen species (ROS) may be useful as a screening assay to prioritize NMs of high concern. To improve reliability and reproducibility, and minimize uncertainty, a standard operating procedure (SOP) has been developed for the detection of ROS using the 2′,7′-dichlorodihydrofluorescein diacetate (DCFH2-DA) assay. The SOP has undergone an inter- and intra-laboratory comparison, to evaluate robustness, reliability, and reproducibility, using representative materials (ZnO, CuO, Mn2O3, and BaSO4 NMs), and a number of calibration tools to normalize data. The SOP includes an NM positive control (nanoparticle carbon black (NPCB)), a chemical positive control (SIN-1), and a standard curve of fluorescein fluorescence. The interlaboratory comparison demonstrated that arbitrary fluorescence units show high levels of partner variability; however, data normalization improved variability. With statistical analysis, it was shown that the SIN-1 positive control provided an extremely high level of reliability and reproducibility as a positive control and as a normalization tool. The NPCB positive control can be used with a relatively high level of reproducibility, and in terms of the representative materials, the reproducibility CuO induced-effects was better than for Mn2O3. Using this DCFH2-DA acellular assay SOP resulted in a robust intra-laboratory reproduction of ROS measurements from all NMs tested, while effective reproduction across different laboratories was also demonstrated; the effectiveness of attaining reproducibility within the interlaboratory assessment was particle-type-specific.

  8. o

    Standard Profiles UK Power Networks Uses for Electricity Demand

    • ukpowernetworks.opendatasoft.com
    csv, excel, json
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Standard Profiles UK Power Networks Uses for Electricity Demand [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ukpn-standard-profiles-electricity-demand/
    Explore at:
    json, excel, csvAvailable download formats
    Dataset updated
    Dec 3, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction The dataset captures yearly load profiles for different demand types, used by UK Power Networks to run import curtailment assessment studies.

    The import curtailment assessment tool has gone live across all three licence areas in September 2024, and uses the standard demand profiles in this data publication to model accepted not-yet-connected demand customers for import curtailment studies.

    Demand specific profile include the following demand types: commercial, industrial, domestic, EV charging stations, bus charging depots, network rail and data centres.

    The profiles have been developed using actual demand data from connected sites within UK Power Networks licence areas falling into each of the demand categories. The output is a yearly profile with half hourly granularity.

    The values are expressed as load factors (percentages) i.e., at each half hour the value can range from 0% to 100% of the maximum import capacity.

    Methodological Approach This section outlines the methodology for generating annual half-hourly demand profiles.

    A minimum of ten connected demand sites for each of the demand types have been used to create the representative profiles.

    Historical data from each of these connected demand sites are either retrieved from UK Power Networks’ Remote Terminal Unit (RTU) or through smart meter data. The historical data collected consist of annual half-hourly meter readings in the last calendar year.

    A Python script was used to process the half-hourly MW data from each of the sites, which have been normalize by the peak MW values from the same site, for each timestamp, as follows:

    Pt (p.u) = P1, tPmax1 + P2, tPmax2 + … + Pn, tPmaxn

    where

    P,t(p.u) is normalised power P is the import in MW from sites 1, 2, ..., n Pmax is max import in the last calendar year, from site 1, 2, ..., n t is time, 30 minutes resolution for one year

    The final profile has been created by selecting a percentile ranging from 95 to 98%.

    Quality Control Statement The dataset is primarily built upon RTU data sourced from connected customer sites within the UK Power Networks' licence areas as well as data collected from customers smart meters.

    For the RTU data, UK Power Networks' Ops Telecoms team continuously monitors the performance of RTUs to ensure that the data they provide is both accurate and reliable. RTUs are equipped to store data during communication outages and transmit it once the connection is restored, minimizing the risk of data gaps. An alarm system alerts the team to any issues with RTUs, ensuring rapid response and repair to maintain data integrity.

    The smart meter data that is used to support certain demand profiles, such as domestic and smaller commercial profiles, is sourced from external providers. While UK Power Networks does not control the quality of this data directly, these data have been incorporated to our models with careful validation and alignment.

    Where MW was not available, data conversions were performed to standardize all units to MW. Any missing or bad data has been addressed though robust data cleaning methods, such as forward filling.

    The final profiles have been validated by ensuring that the profile aligned with expected operational patterns.

    Assurance Statement The dataset is generated using a script developed by the Network Access team, allowing an automated conversion from historical half hourly data to a yearly profile. The profiles will be reviewed annually to assess any changes in demand patterns and determine if updates of demand specific profiles are necessary. This process ensures that the profiles remain relevant and reflective of real-world demand dynamics over time.

    Other Download dataset information: Metadata (JSON)

    Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/

  9. f

    Data from: An automatic gain control circuit to improve ECG acquisition

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Rovetta; João Fernando Refosco Baggio; Raimes Moraes (2023). An automatic gain control circuit to improve ECG acquisition [Dataset]. http://doi.org/10.6084/m9.figshare.5668756.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELO journals
    Authors
    Marco Rovetta; João Fernando Refosco Baggio; Raimes Moraes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Introduction Long-term electrocardiogram (ECG) recordings are widely employed to assist the diagnosis of cardiac and sleep disorders. However, variability of ECG amplitude during the recordings hampers the detection of QRS complexes by algorithms. This work presents a simple electronic circuit to automatically normalize the ECG amplitude, improving its sampling by analog to digital converters (ADCs). Methods The proposed circuit consists of an analog divider that normalizes the ECG amplitude using its absolute peak value as reference. The reference value is obtained by means of a full-wave rectifier and a peak voltage detector. The circuit and tasks of its different stages are described. Results Example of the circuit performance for a bradycardia ECG signal (40bpm) is presented; the signal has its amplitude suddenly halved, and later, restored. The signal is automatically normalized after 5 heart beats for the amplitude drop. For the amplitude increase, the signal is promptly normalized. Conclusion The proposed circuit adjusts the ECG amplitude to the input voltage range of ADC, avoiding signal to noise ratio degradation of the sampled waveform in order to allow a better performance of processing algorithms.

  10. o

    The Challenge of Stability in High-Throughput Gene Expression Analysis:...

    • omicsdi.org
    xml
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emma Carmelo, The Challenge of Stability in High-Throughput Gene Expression Analysis: Comprehensive Selection and Evaluation of Reference Genes for BALB/c Mice Spleen Samples in the Leishmania infantum Infection Model. [Dataset]. https://www.omicsdi.org/dataset/geo/GSE80709
    Explore at:
    xmlAvailable download formats
    Authors
    Emma Carmelo
    Variables measured
    Other
    Description

    The interaction of Leishmania with BALB/c mice induces dramatic changes in transcriptome patterns in the parasite, but also in the target organs (spleen, liver…) due to its response against infection. Real-time quantitative PCR (qPCR) is an interesting approach to analyze these changes and understand the immunological pathways that lead to protection or progression of disease. However, qPCR results need to be normalized against one or more reference genes (RG) to correct for non-specific experimental variation. The development of technical platforms for high-throughput qPCR analysis, and powerful software for analysis of qPCR data, have acknowledged the problem that some reference genes widely used due to their known or suspected "housekeeping" roles, should be avoided due to high expression variability across different tissues or experimental conditions. In this paper we evaluated the stability of 112 genes using three different algorithms: geNorm, NormFinder and RefFinder in spleen samples from BALB/c mice under different experimental conditions (control and Leishmania infantum-infected mice). Despite minor discrepancies in the stability ranking shown by the three methods, most genes show very similar performance as RG (either good or poor) across this massive data set. Our results show that some of the genes traditionally used as RG in this model (i.e. B2m, Polr2a and Tbp) are clearly outperformed by others. In particular, the combination of Il2rg + Itgb2 was identified among the best scoring candidate RG for every group of mice and every algorithm used in this experimental model. Finally, we have demonstrated that using "traditional" vs rationally-selected RG for normalization of gene expression data may lead to loss of statistical significance of gene expression changes when using large-scale platforms, and therefore misinterpretation of results. Taken together, our results highlight the need for a comprehensive, high-throughput search for the most stable reference genes in each particular experimental model Overall design: 47 BALB/c mice (14-15 weeks old) were used in this study. Mice were randomly separated in two groups: (i) 23 control mice and (ii) 24 mice that were infected with 10^6 stationary-phase L. infantum promastigotes via tail vein. Mice were euthanized by cervical dislocation and spleens were removed and immediately stored in RNAlater at -70C. After RNA extraction and reverse transcription, Real-time PCR was performed using QuantStudio 12K Flex Real-Time PCR System following manufacturer’s instructions, using Custom TaqMan OpenArray Real-Time PCR Plates. Ct values obtained from RT-qPCR were analized by three different algorithms (geNorm, NormFinder and RefFinder) in order to evaluate the stability of the 112 genes and identify the most suitable for normalization of gene expression. Please note that the three algorithms were used for the identification of the best Reference genes in all samples. Once identified, those RG were used to normalize gene expression using geNorm only. The sample data table includes the normalized data using Il2rg+Itgb2 as reference genes, as identified and validated in the associated publication. The 'geNorm_Polr2a_Tbp_normalized.txt' includes the data normalized using Polr2a+Tbp as reference genes, two reference genes traditionally used in the literature for this model.

  11. DDSP EMG dataset.xlsx

    • commons.datacite.org
    • figshare.com
    Updated Jul 14, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marta Cercone (2019). DDSP EMG dataset.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.8864411
    Explore at:
    Dataset updated
    Jul 14, 2019
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Figsharehttp://figshare.com/
    Authors
    Marta Cercone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study was performed in accordance with the PHS Policy on Humane Care and Use of Laboratory Animals, federal and state regulations, and was approved by the Institutional Animal Care and Use Committees (IACUC) of Cornell University and the Ethics and Welfare Committee at the Royal Veterinary College.Study design: adult horses were recruited if in good health and following evaluation of the upper airways through endoscopic exam, at rest and during exercise, either overground or on a high-speed treadmill using a wireless videoendoscope. Horses were categorized as “DDSP” affected horses if they presented with exercise-induced intermittent dorsal displacement of the soft palate consistently during multiple (n=3) exercise tests, or “control” horses if they did not experience dorsal displacement of the soft palate during exercise and had no signs compatible with DDSP like palatal instability during exercise, soft palate or sub-epiglottic ulcerations. Horses were instrumented with intramuscular electrodes, in one or both thyro-hyoid muscles for EMG recording, hard wired to a wireless transmitter for remote recording implanted in the cervical area. EMG recordings were then made during an incremental exercise test based on the percentage of maximum heart rate (HRmax). Incremental Exercise Test After surgical instrumentation, each horse performed a 4-step incremental test while recording TH electromyographic activity, heart rate, upper airway videoendoscopy, pharyngeal airway pressures, and gait frequency measurements. Horses were evaluated at exercise intensities corresponding to 50, 80, 90 and 100% of their maximum heart rate with each speed maintained for 1 minute. aryngeal function during the incremental test was recorded using a wireless videoendoscope (Optomed, Les Ulis, France), which was placed into the nasopharynx via the right ventral nasal meatus. Nasopharyngeal pressure was measured using a Teflon catheter (1.3 mm ID, Neoflon) inserted through the left ventral nasal meatus to the level of the left guttural pouch ostium. The catheter was attached to differential pressure transducers (Celesco LCVR, Celesco Transducers Products, Canoga Park, CA, USA) referenced to atmospheric pressure and calibrated from -70 to 70 mmHg. Occurrence of episodes of dorsal displacement of the soft palate was recorded and number of swallows during each exercise trials were counted for each speed interval.
    EMG recordingEMG data was recorded through a wireless transmitter device implanted subcutaneously. Two different transmitters were used: 1) TR70BB (Telemetry Research Ltd, Auckland, New Zealand) with 12bit A/D conversion resolution, AC coupled amplifier, -3dB point at 1.5Hz, 2KHz sampling frequency (n=5 horses); or 2) ELI (Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria) [23], with 12bit A/D conversion resolution, AC coupled amplifier, amplifier gain 1450, 1KHz sampling frequency (n=4 horses). The EMG signal was transmitted through a receiver (TR70BB) or Bluetooth (ELI) to a data acquisition system (PowerLab 16/30 - ML880/P, ADInstruments, Bella Vista, Australia). The EMG signal was amplified with octal bio-amplifier (Octal Bioamp, ML138, ADInstruments, Bella Vista, Australia) with a bandwidth frequency ranging from 20-1000 Hz (input impedance = 200 MV, common mode rejection ratio = 85 dB, gain = 1000), and transmitted to a personal computer. All EMG and pharyngeal pressure signals were collected at 2000 Hz rate with LabChart 6 software (ADInstruments, Bella Vista, Australia) that allows for real-time monitoring and storage for post-processing and analysis.
    EMG signal processingElectromyographic signals from the TH muscles were processed using two methods; 1) a classical approach to myoelectrical activity and median frequency and 2) wavelet decomposition. For both methods, the beginning and end of recording segments including twenty consecutive breaths, at the end of each speed interval, were marked with comments in the acquisition software (LabChart). The relationship of EMG activity with phase of the respiratory cycle was determined by comparing pharyngeal pressure waveforms with the raw EMG and time-averaged EMG traces.For the classical approach, in a graphical user interface-based software (LabChart), a sixth-order Butterworth filter was applied (common mode rejection ratio, 90 dB; band pass, 20 to 1,000 Hz), the EMG signal was then amplified, full-wave rectified, and smoothed using a triangular Bartlett window (time constant: 150ms). The digitized area under the time-averaged full-wave rectified EMG signal was calculated to define the raw mean electrical activity (MEA) in mV.s. Median Power Frequency (MF) of the EMG power spectrum was calculated after a Fast Fourier Transformation (1024 points, Hann cosine window processing). For the wavelet decomposition, the whole dataset including comments and comment locations was exported as .mat files for processing in MATLAB R2018a with the Signal Processing Toolbox (The MathWorks Inc, Natick, MA, USA). A custom written automated script based on Hodson-Tole & Wakeling [24] was used to first cut the .mat file into the selected 20 breath segments and subsequently process each segment. A bank of 16 wavelets with time and frequency resolution optimized for EMG was used. The center frequencies of the bank ranged from 6.9 Hz to 804.2 Hz [25]. The intensity was summed (mV2) to a total, and the intensity contribution of each wavelet was calculated across all 20 breaths for each horse, with separate results for each trial date and exercise level (80, 90, 100% of HRmax as well as the period preceding episodes of DDSP). To determine the relevant bandwidths for the analysis, a Fast Fourier transform frequency analysis was performed on the horses unaffected by DDSP from 0 to 1000 Hz in increments of 50Hz and the contribution of each interval was calculated in percent of total spectrum as median and interquartile range. According to the Shannon-Nyquist sampling theorem, the relevant signal is below ½ the sample rate and because we had instrumentation sampling either 1000Hz and 2000Hz we choose to perform the frequency analysis up to 1000Hz. The 0-50Hz interval, mostly stride frequency and background noise, was excluded from further analysis. Of the remaining frequency spectrum, we included all intervals from 50-100Hz to 450-500Hz and excluded the remainder because they contributed with less than 5% to the total amplitude.Data analysisAt the end of each exercise speed interval, twenty consecutive breaths were selected and analyzed as described above. To standardize MEA, MF and mV2 within and between horses and trials, and to control for different electrodes size (i.e. different impedance and area of sampling), data were afterward normalized to 80% of HRmax value (HRmax80), referred to as normalized MEA (nMEA), normalized MF (nMF) and normalized mV2 (nmV2). During the initial processing, it became clear that the TH muscle is inconsistently activated at 50% of HRmax and that speed level was therefore excluded from further analysis. The endoscopy video was reviewed and episodes of palatal displacement were marked with comments. For both the classical approach and wavelet analysis, an EMG segment preceding and concurrent to the DDSP episode was analyzed. If multiple episodes were recorded during the same trial, only the period preceding the first palatal displacement was analyzed. In horses that had both TH muscles implanted, the average between the two sides was used for the analysis. Averaged data from multiple trials were considered for each horse. Descriptive data are expressed as means with standard deviation (SD). Normal distribution of data was assessed using the Kolmogorov-Smirnov test and quantile-quantile (Q-Q) plot. To determine the frequency clusters in the EMG signal, a hierarchical agglomerative dendrogram was applied using the packages Matplotlib, pandas, numpy and scipy in python (version 3.6.6) executed through Spyder (version 3.2.2) and Anaconda Navigator. Based on the frequency analysis, wavelets included in the cluster analysis were 92.4 Hz, 128.5 Hz, 170.4 Hz, 218.1 Hz, 271.5 Hz, 330.6 Hz, 395.4 Hz and 465.9 Hz. The number of frequency clusters was set to two based on maximum acceleration in a scree plot and maximum vertical distance in the dendrogram. For continuous outcome measures (number of swallows, MEA, MF, and mV2) a mixed effect model was fitted to the data to determine the relationship between the outcome variable and relevant fixed effects (breed, sex, age, weight, speed, group) using horse as a random effect. Tukey’s post hoc tests and linear contrasts used as appropriate. Statistical analysis was performed using JMP Pro13 (SAS Institute, Cary, NC, USA). Significance set at P < 0.05 throughout.

  12. Customer Data Platform Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Customer Data Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/customer-data-platform-market-global-industry-analysis
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Customer Data Platform (CDP) Market Outlook



    According to our latest research, the global Customer Data Platform (CDP) market size reached USD 3.6 billion in 2024, driven by the accelerating digital transformation initiatives across industries and the growing emphasis on personalized customer experiences. The market is projected to expand at a robust CAGR of 23.2% between 2025 and 2033, reaching an estimated USD 27.8 billion by 2033. This impressive growth is fueled by increased data generation, evolving regulatory landscapes, and the rising adoption of advanced analytics and artificial intelligence for customer engagement.



    One of the primary growth drivers for the Customer Data Platform (CDP) market is the rapidly increasing volume of customer data generated through various digital touchpoints, including social media, e-commerce platforms, and mobile applications. Organizations are recognizing the critical importance of consolidating this fragmented data into unified, actionable customer profiles to drive more targeted and effective marketing campaigns. As businesses strive to deliver seamless omnichannel experiences, the demand for robust CDP solutions that can ingest, normalize, and activate data in real-time continues to surge. Moreover, the proliferation of IoT devices and connected technologies is further amplifying the need for scalable and flexible CDP architectures that can handle complex data ecosystems.



    Another significant factor propelling the growth of the CDP market is the increasing focus on data privacy and regulatory compliance. With stringent regulations such as GDPR in Europe, CCPA in California, and similar frameworks emerging globally, organizations are under pressure to manage customer data responsibly and transparently. CDPs offer advanced consent management, data governance, and auditability features, enabling enterprises to maintain compliance while still leveraging customer data for personalization and analytics. This regulatory environment is compelling businesses, especially in highly regulated sectors like BFSI and healthcare, to adopt CDP solutions as a foundational component of their data strategy.



    The integration of artificial intelligence and machine learning capabilities into CDP platforms is revolutionizing how organizations extract value from customer data. Advanced CDPs are now equipped with predictive analytics, automated segmentation, and real-time recommendation engines, enabling marketers to anticipate customer needs and deliver hyper-personalized experiences. As enterprises increasingly shift towards data-driven decision-making, the ability to unify, analyze, and activate customer data in real-time is becoming a key competitive differentiator. This technological evolution is not only enhancing the utility of CDPs but also expanding their application across new verticals and use cases, thereby broadening the addressable market.



    Regionally, North America continues to dominate the CDP market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The high adoption rate in North America is attributed to the presence of leading technology providers, early digital adoption, and advanced marketing practices. However, Asia Pacific is witnessing the fastest growth, driven by rapid digitalization, burgeoning e-commerce, and increasing investments in customer engagement technologies across emerging economies like China and India. Europe remains a significant market, propelled by strict data privacy regulations and a mature digital ecosystem. Latin America and the Middle East & Africa are also showing promising growth, albeit from a smaller base, as organizations in these regions ramp up their digital transformation efforts.





    Component Analysis



    The Customer Data Platform (CDP) market by component is broadly segmented into software and services, with software accounting for the lion’s share of the market revenue in 2024. The software component encompasses the core platforms that enable organizations to aggregate, unify, and activate customer data from disparate sources.

  13. E

    Data from: Urdu Summary Corpus

    • live.european-language-grid.eu
    txt
    Updated Oct 19, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Urdu Summary Corpus [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1391
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 19, 2017
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Urdu Summary Corpus

    Urdu summary corpus consists of 50 articles collected from various blogs. From the original HTML documents only unformatted content text was kept, removing all other things. We provide abstractive summaries of these 50 articles. After normalization, we further applied different NLP tools on the articles to generate part-of-speech tagged, morphologically analyzed, lemmatized and stemmed articles.

    Urdu Summary Corpus Tools

    + Normalization is taken from [1], Diacritic marks are also removed in this step.

    + Table-lookup based Morphological Analyzer and lemmatizer is built from [3].

    + Stemmer is built from [1]

    + Table-lookup based POS tagger is built from [4]. We used unigram and bigram counts.

    Commands:

    Unzip USCTools.zip

    Open Console

    Go to USCTools directly typing: cd USCTools

    For Normalization

    $ java -cp bin USCTools normalize input.txt output.txt

    For Lemmatization

    $ java -cp bin USCTools lemmatize input.txt output.txt

    For Morphological analysis

    $ java -cp bin USCTools morph_analysis input.txt output.txt

    For stemming by Assas-Band

    $ java -cp bin USCTools stemming input.txt output.txt

    For POS tagging

    $ java -cp bin USCTools tagging input.txt output.txt

    [1] Q.-u.-A. Akram, A. Naseer, and S. Hussain. Proceedings of the 7th Workshop on Asian Language Resources (ALR7), chapter Assas-band, an Affix- Exception-List Based Urdu Stemmer, pages 40-47. Association for Computational Linguistics, 2009.

    [2] A. Gulzar. Urdu normalization utility v1.0. Technical report, Center for Language Engineering, Al-kwarzimi Institute of Computer Science (KICS), University of Engineering, Lahore, Pakistan. http://www.cle.org.pk/software/langproc/urdunormalization.htm, 2007.

    [3] M. Humayoun, H. Hammarström, and A. Ranta. Urdu morphology, orthography and lexicon extraction. CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages, LSA Linguistic Institute. Stanford University, California, USA., pages 21-22, 2007. http://www.lama.univ-savoie.fr/ humayoun/UrduMorph/.

    [4] B. Jawaid, A. Kamran, and O. Bojar. A tagged corpus and a tagger for urdu. In N. C. C. Chair), K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, may 2014. European Language Resources Association (ELRA). https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-65A9-5

  14. m

    Crohn's Disease Treatment Prediction Model

    • data.mendeley.com
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henry Adams (2024). Crohn's Disease Treatment Prediction Model [Dataset]. http://doi.org/10.17632/y2hhsygy49.1
    Explore at:
    Dataset updated
    Jul 10, 2024
    Authors
    Henry Adams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DB for Machine learning using clinical data at baselines. Used to predicts the medium-term efficacy of biologic therapies for in patients with Crohn's Disease.

    1. Data Collection Sources
    2. Electronic Health Records (EHR)
    3. Clinical trials and studies
    4. Genetic data
    5. Patient-reported outcomes
    6. Medical imaging

    Types of Data - Demographic information - Clinical data (symptoms, disease severity, treatment history) - Genetic data (SNPs, mutations) - Lab results (CRP levels, fecal calprotectin) - Imaging data (MRI, endoscopy) - Lifestyle data (diet, smoking status)

    1. Data Preprocessing Steps
    2. Data Cleaning: Handle missing values, remove duplicates, correct errors.
    3. Data Normalization/Standardization: Normalize lab results, standardize imaging data.
    4. Feature Engineering: Create new features from existing data, e.g., calculate disease activity scores.
    5. Encoding Categorical Data: Convert categorical variables to numerical ones using one-hot encoding or label encoding.
    6. Data Splitting: Split data into training, validation, and test sets.
  15. n

    Data from: Learning of probabilistic punishment as a model of anxiety...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Sep 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Jacobs; Madeleine Allen; Junchol Park; Bita Moghaddam (2022). Learning of probabilistic punishment as a model of anxiety produces changes in action but not punisher encoding in the dmPFC and VTA [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mkn
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2022
    Dataset provided by
    Oregon Health & Science University
    Janelia Research Campus
    Authors
    David Jacobs; Madeleine Allen; Junchol Park; Bita Moghaddam
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Previously, we developed a novel model for anxiety during motivated behavior by training rats to perform a task where actions executed to obtain a reward were probabilistically punished and observed that after learning, neuronal activity in the ventral tegmental area (VTA) and dorsomedial prefrontal cortex (dmPFC) represent the relationship between action and punishment risk (Park & Moghaddam, 2017). Here we used male and female rats to expand on the previous work by focusing on neural changes in the dmPFC and VTA that were associated with the learning of probabilistic punishment, and anxiolytic treatment with diazepam after learning. We find that adaptive neural responses of dmPFC and VTA during the learning of anxiogenic contingencies are independent from the punisher experience and occur primarily during the peri-action and reward period. Our results also identify peri-action ramping of VTA neural calcium activity, and VTA-dmPFC correlated activity, as potential markers for the anxiolytic properties of diazepam. Methods Subjects Male and female Long-Evans (bred in house n=8) and Sprague-Dawley (Charles River n=5) rats were used. Animals were pair-housed on a reverse 12 h:12 h light/dark cycle. All experimental procedures and behavioral testing were performed during the dark (active) cycle. All studies included both strains of male (n=7) and female (n=6) rats. All experimental procedures were approved by the OHSU Institutional Animal Use and Care Committee and were conducted in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals. Initial Training & Punishment Risk Task (PRT) The PRT follows previously published methods (Park & Moghaddam, 2017; Chowdhury et al., 2019). Rats were trained to make an instrumental response to receive a 45-mg sugar pellet (BioServe) under fixed ratio one schedule of reinforcement (FR1). The availability of the nosepoke for reinforcement was signaled by a 5-s tone. After at least three FR1 training sessions, PRT sessions began. PRT sessions consisted of three blocks of 30 trials each. The action-reward contingency remained constant, with one nose-poke resulting in one sugar pellet. However, there was a probability of receiving a footshock (300 ms electrical footshock of 0.3 mA) after the FR1 action, which increased over the blocks (0%, 6%, or 10% in blocks 1, 2 and 3, respectively). To minimize generalization of the action-punishment contingency, blocks were organized in an ascending footshock probability with 2-min timeouts between blocks. Punishment trials were pseudo-randomly assigned, with the first footshock occurring within the first five trials. All sessions were terminated if not completed in 180 mins. Fiber Photometry Analysis Peri-event analysis: Signals from the 465 (GCaMP6s) and 560 (tdTomato) streams were processed in Python (Version 3.7.4) using custom-written scripts similar to previously published methods (Jacobs & Moghaddam, 2020). Briefly, 465 and 560 streams were low pass filtered at 3 Hz using a butterworth filter and subsequently broken up based on the start and end of a given trial. The 560 signal was fitted to the 465 using a least-squares first order polynomial and subtracted from 465 signal to yield the change in fluorescent activity (ΔF/F= 465 signal - fitted 560 signal/ fitted 560 signal). Peri-event z-scores were computed by comparing the ΔF/F after the behavioral action to the 4-2 sec baseline ΔF/F prior to a given epoch. To investigate potential different neural calcium responses to receiving the footshock vs. anticipation, punished (i.e. shock) trials and unpunished trials were separated. Trials with a z-score value > 40 were excluded. From approximately 3,000 trials analyzed, this occurred on < 1% of trials. Area under the curve (AUC) analyses: To represent individual data we calculated the AUCs for each subject. To quantify peri-cue and peri-action changes we calculated a change or summation score between 1 sec before (pre-event) and 1 sec after (post-event) cue onset or action execution. For the reward period, we calculated a change score by comparing 2 sec after reward delivery to the 1 sec prior to reward delivery. For punished trials, response to footshock was calculated as the change from 1 sec following footshock delivery compared to the 1 sec before footshock. Outliers were removed using GraphPad Prism’s ROUT method (Q=1%; Motulsky & Brown, 2006) which removed only three data points from the analysis. Time Lagged Cross-Correlation Analysis: Cross-correlation analysis has been used to identify networks from simultaneously measured fiber photometry signals (Sych et al., 2019). For rats with properly placed fibers in the dmPFC and VTA, correlations between photometry signals arising in the VTA and dmPFC were calculated for the peri-action, peri-footshock and peri-reward periods using the z-score normalized data. The following equation was used to normalize covariance scores for each time lag to achieve a correlation coefficient between -1 and 1: Coef = Cov/(s1*s2*n) Where Cov is the covariance from the dot product of the signal for each timepoint, s1 and s2 are the standard deviations of the dmPFC and VTA streams, respectively, and n is the number of samples. An entire cross-correlations function was derived for each trial and epoch. Comparison to Electrophysiology Results: Fiber photometry data for the third PRT session were compared to the average of the 50 msec binned single unit data (see Figure 4 of Park & Moghaddam, 2017). This third PRT session corresponds to the session electrophysiology data were collected from. To overlay data from the two techniques, data were lowpass filtered at 3 Hz and photometry data were downsampled to 20 Hz (to match the 50 msec binning). Data from both streams were then min-max normalized between 0 and 1 at the corresponding cue and action+reward epochs. To assess the similarity of the two signals, we performed a Pearson correlation analysis between the normalized single unit and fiber photometry data for cue or action+reward epochs at each risk block, as well as between randomly shuffled photometry signals with single-unit response as a control. For significant Pearson correlations, we performed cross-correlation analysis (see above) to investigate if the photometry signal lagged behind electrophysiology given the slower kinetics of GCAMP6 compared to single-unit approaches (Chen et al., 2013). Statistical Analysis For FR1 training, trial completion was measured as the number of food pellets earned. Data were assessed for the first 3-4 training sessions. Action and reward latencies were defined as time from cue onset to action execution or from food delivery until retrieval, respectively. Values were assessed using a mixed-effects model with the training as a factor and post-hoc tests were performed using the Bonferroni correction where appropriate. For the PRT, trial completion was measured as the percentage of completed trials (of the 30 possible) for each block. Action latencies were defined as time from cue onset to action execution. Data were analyzed using a two-way RM ANOVA or mixed effects model. Because there were missing data for non-random reasons (e.g. failure to complete trials in response to punishment risk) we took the average of risk blocks (blocks 2 and 3) and the no-risk block (block 1) to permit repeated measures analysis. We used mixed effects model if data were missing for random reasons. Risk and session were used as factors and post-hoc tests were performed using the Bonferroni correction where appropriate. When only two groups were compared a paired t-test or Wilcoxon test was performed after checking normality assumption through the Shapiro-Wilk test. To assess changes in neural calcium activity, we utilized a permutation-based approach as outlined in (Jean-Richard-dit-Bressel et al., 2020) using Python (Version 3). An average response for each subject for a given time point in the cue, action, or reward delivery period was compared to either the first PRT or saline session. For each time point, a null distribution was generated by shuffling the data, randomly selecting the data into two groups, and calculating the mean difference between groups. This was done 1,000 times for each time-point and a p-value was obtained by determining the percentage of times a value in the null distribution of mean differences was greater than or equal to the observed difference in the unshuffled data (one-tailed for comparisons to 0% risk and FR1 data, two-tailed for all other comparisons). To control for multiple comparisons we utilized a consecutive threshold approach based on the 3 Hz lowpass filter window (Jean-Richard-dit-Bressel et al., 2020; Pascoli et al., 2018), where a p-value < 0.05 was required for 14 consecutive samples to be considered significant. To assess AUC changes in photometry data, we compared all risk blocks and all sessions using ANOVA with factors risk block and session. Because not all subjects completed learning and diazepam data, we used an ordinary two-way ANOVA. Significant main effects and interactions were assessed with post-hoc Bonferroni multiple comparison tests. To assess correlated activity changes as a function of risk or session, we took the peak and 95% confidence interval for the overall cross-correlation function. These values were compared by a two-way ANOVA with factors risk and session and utilized a post-hoc Bonferroni correction. Other than permutation tests, all statistical tests were done using GraphPad Prism (Version 8) and an α of 0.05. Results for all statistical tests and corresponding figures can be found in Table 1 or supplemental figures. Excluded Data Outliers from latency analysis were removed when a data point was > 5 SDs above the mean across all blocks. This removed one data point from the analysis. In FR1 studies, data from one rat’s third and fourth session were excluded because

  16. f

    Full results with pvalues and foldchanges.

    • figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Strauss; Håvard Mikkelsen; Jessica Furriol (2023). Full results with pvalues and foldchanges. [Dataset]. http://doi.org/10.1371/journal.pone.0259373.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Philipp Strauss; Håvard Mikkelsen; Jessica Furriol
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MCD: Minimal change disease, MN: Membranous nephropathy, HT: Hypertension, DIA2: Diabetes type 2, FSGS; Focal segmental glomerulosclerosis, IGAN; IgA nephropathy, TCMR: t-cell mediated rejection, RPGN; Rapidly progressive glomerulonephritis, STA: stable allograft, ATI: acute tubular injury, IFTA: Interstitial fibrosis and tubular atrophy. In the columns under the gene IDs “Yes” refers to genes differentially expressed in control and test samples in the dataset. “No” refers to RG equally expressed in control and test samples in the defined dataset. Not available (NA) refers to RG not tested in specific datasets. Not detected (ND) refers to genes undetected in the specific dataset. Summaries and percentages are noted below each column. (XLSX)

  17. Z

    GPR data collected near the Chepeta Weather Station and the DUST-1 sampler,...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Munroe, Jeffrey S. (2023). GPR data collected near the Chepeta Weather Station and the DUST-1 sampler, Uinta Mountains, Utah [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8302138
    Explore at:
    Dataset updated
    Sep 2, 2023
    Dataset authored and provided by
    Munroe, Jeffrey S.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Uinta Mountains, Utah
    Description

    Ground penetrating radar data collected on September 9, 2021 in the Uinta Mountains at the Chepeta Remote Automated Weather Station (RAWS) and the DUST-1 passive dust sampler. Data were collected with a GSSI SIR-4000 control unit and a 350HS antenna connected to an Emlid Reach RS2 GPS receiver. Data files have been distance normalized and field-applied range gains have been removed before being exported in .sgy format. Files were also exported in .kml format for viewing the transect locations in Google Earth. Two long transects (780 feet each) were collected. The "West" transect passed to the west of the Chepta RAWS; the "East" transect passed to the east. The transects started at different points along the northern lip of the summit upland and came together at a common point at their southern ends. Marks were made in the data file every 60 feet while surveying; these marks were used to distance normalize the results. The system collected 334 scans/second with 512 samples/scan while surveying.
    Two 30-foot perpendicular transects were also surveyed (north to south, and west to east) with their intersection adjacent to a soil pit excavated to a depth of 92 cm. The location of the soil pit was noted in each transect with a mark near 16 feet. Data were used to evaluate spatial variations in the thickness of regolith overlying the bedrock beneath this gently sloping summit flat.

  18. f

    Comparison of the expression of each reference gene in HT and non-diseased...

    • plos.figshare.com
    xls
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Strauss; Håvard Mikkelsen; Jessica Furriol (2023). Comparison of the expression of each reference gene in HT and non-diseased control biopsies. [Dataset]. http://doi.org/10.1371/journal.pone.0259373.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Philipp Strauss; Håvard Mikkelsen; Jessica Furriol
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    3a displays the pvalues from the qPCR experiments. Data were analyzed by Mann-Whitney Asymp. Sig. (2-tailed). None of the comparisons yielded statistically significant results. 3b shows the fold change (FC) differences and Pearson’s correlation in the expression of selected references genes. Fold changes are represented as mean±SD log FC.

  19. f

    Validating Internal Control Genes for the Accurate Normalization of qPCR...

    • plos.figshare.com
    tiff
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julia Lambret-Frotté; Leandro C. S. de Almeida; Stéfanie M. de Moura; Flavio L. F. Souza; Francisco S. Linhares; Marcio Alves-Ferreira (2023). Validating Internal Control Genes for the Accurate Normalization of qPCR Expression Analysis of the Novel Model Plant Setaria viridis [Dataset]. http://doi.org/10.1371/journal.pone.0135006
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Julia Lambret-Frotté; Leandro C. S. de Almeida; Stéfanie M. de Moura; Flavio L. F. Souza; Francisco S. Linhares; Marcio Alves-Ferreira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Employing reference genes to normalize the data generated with quantitative PCR (qPCR) can increase the accuracy and reliability of this method. Previous results have shown that no single housekeeping gene can be universally applied to all experiments. Thus, the identification of a suitable reference gene represents a critical step of any qPCR analysis. Setaria viridis has recently been proposed as a model system for the study of Panicoid grasses, a crop family of major agronomic importance. Therefore, this paper aims to identify suitable S. viridis reference genes that can enhance the analysis of gene expression in this novel model plant. The first aim of this study was the identification of a suitable RNA extraction method that could retrieve a high quality and yield of RNA. After this, two distinct algorithms were used to assess the gene expression of fifteen different candidate genes in eighteen different samples, which were divided into two major datasets, the developmental and the leaf gradient. The best-ranked pair of reference genes from the developmental dataset included genes that encoded a phosphoglucomutase and a folylpolyglutamate synthase; genes that encoded a cullin and the same phosphoglucomutase as above were the most stable genes in the leaf gradient dataset. Additionally, the expression pattern of two target genes, a SvAP3/PI MADS-box transcription factor and the carbon-fixation enzyme PEPC, were assessed to illustrate the reliability of the chosen reference genes. This study has shown that novel reference genes may perform better than traditional housekeeping genes, a phenomenon which has been previously reported. These results illustrate the importance of carefully validating reference gene candidates for each experimental set before employing them as universal standards. Additionally, the robustness of the expression of the target genes may increase the utility of S. viridis as a model for Panicoid grasses.

  20. f

    Data from: HVN KINGS GUT Metadata Record - Kiwifruit Ingestion to Normalise...

    • auckland.figshare.com
    • ourarchive.otago.ac.nz
    json
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Gearry; Nicole Roy (2024). HVN KINGS GUT Metadata Record - Kiwifruit Ingestion to Normalise Gut Symptoms in the Christchurch IBS cohort [Dataset]. http://doi.org/10.17608/k6.auckland.21606624.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 20, 2024
    Dataset provided by
    The University of Auckland
    Authors
    Richard Gearry; Nicole Roy
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Christchurch
    Description

    This metadata record and it's attached files make statements about the kinds of data collected as part of this research, and set out policies for governance of that data, now and in the future.Description: The Kiwifruit Ingestion to Normalise Gut Symptoms (KINGS) study was launched to understand more about the clinical, psychological, biological, and dietary changes in individuals with Functional Constipation (FC) or Constipation-Predominant Irritable Bowel Syndrome (IBS/C). We believe that the approach used in this study can help generate new knowledge beyond that already reported for the consumption of green kiwifruit. In this single-blinded, negative-controlled, randomised, parallel study, we aim to find differences in abdominal comfort between individuals with FC or IBS-C by ingesting either two Green Kiwifruit daily or Maltodextrin over the course of 4 weeks. The trial will be a maximum of up to 9 weeks in total. We hypothesise that the habitual consumption of two green kiwifruit will improve abdominal pain as measured by the GSRS ratings, other gut symptoms, bowel habits, fatigue, anxiety, depression, and total intestinal transit in individuals with FC and IBS-C. Additionally, we hope to find differences in the microbiome and metabolome in individuals consuming kiwifruit compared to control.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Viesturs Jūlijs Lasmanis; Normunds Grūzītis (2023). LVMED: Dataset of Latvian text normalisation samples for the medical domain [Dataset]. https://repository.clarin.lv/repository/xmlui/handle/20.500.12574/85

Data from: LVMED: Dataset of Latvian text normalisation samples for the medical domain

Related Article
Explore at:
Dataset updated
May 30, 2023
Authors
Viesturs Jūlijs Lasmanis; Normunds Grūzītis
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms).

Training dataset: 64,665 sentence pairs Validation dataset: 7,185 sentence pairs. Testing dataset: 7,984 sentence pairs.

All sentences are extracted from a public web corpus (https://korpuss.lv/id/Tīmeklis2020) and contain at least one medical term.

Search
Clear search
Close search
Google apps
Main menu