Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here you can find raw data and information about each of the 34 datasets generated by the mulset algorithm and used for further analysis in SIMON.
Each dataset is stored in separate folder which contains 4 files:
json_info: This file contains, number of features with their names and number of subjects that are available for the same dataset
data_testing: data frame with data used to test trained model
data_training: data frame with data used to train models
results: direct unfiltered data from database
Files are written in feather format. Here is an example of data structure for each file in repository.
File was compressed using 7-Zip available at https://www.7-zip.org/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains data collected as part of the Ancient Adhesives project under the European Union’s Horizon 2020 research and innovation programme Grant Agreement No. 678 804151 (Grant holder G.H.J.L.).
It is being made public to act as supplementary data for a publication and for other researchers to use this data in their own work.
The data in this dataset were collected at TUDelft, University of Cantabria, and Museum of Prehistory and Archaeology of Cantabria in 2023.
This dataset contains:
The acronym MOR stands for Morín Cave, a cave in Cantabria (Spain) where the objects were found.
The data included in this dataset has been organized per method. For each specimen, more than one point was measured as indicated in the file name. Only the measurements with interpretable results are made available.
The file name includes the unique ID of the object + the analytical technique + the number of the scan. For example: MOR11_ATR_loc1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This imaging mass cytometry (IMC) dataset serves as an example to demonstrate raw data processing and downstream analysis tools. The data was generated as part of the Integrated iMMUnoprofiling of large adaptive CANcer patient cohorts (IMMUcan) project (immucan.eu) using the Hyperion imaging system (www.fluidigm.com/products-services/instruments/hyperion). To get an overview on the technology and available analysis strategies, please visit bodenmillergroup.github.io/IMCWorkflow. The individual data files are described below:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data from HDMSe and SWATH MS analyses of 309 prostate cancer serum samples. Prostate cancer cohort:
309 patients were divided into control (n=112), prostate cancer (PCa) (n=175), and benign prostate hyperplasia (BPH) (n=22). PCa patients were then subdivided into active surveillance (AS) (n=51) or treatment group. Treatments were radiotherapy (pre: n=26, post: n=14), hormone therapy (pre: n=7, post: n=8), prostatectomy (pre: n=21, post: n=8), and radiotherapy (pre: n=23, post: n=17)
Raw Data in .csv format for use with the R data wrangling scripts.
These data (illumina paired end fastq) exemplify the different WGS types which have been isolated from UK
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This document mainly includes Coding sequences of PME-domain and pro-region of Type-1 PME in representative plants,Raw data from fusion gene analysis by LIR inference,Raw data from repeated sequence studies within four Cruciferae representative species and Graphical Abstract.
Overview The University of Notre Dame (ND) scanning lidar dataset used for the WFIP2 Campaign is provided. The raw dataset contains the radial velocity and backscatter measurements along with the beam location and other lidar parameters in the header. Data Details 1) A Halo photonics scanning lidar, owned by ND, was deployed and operated from 12/17/2015 to 02/09/2016. On 02/09/2016, this lidar was replaced by a Halo photonics scanning lidar owned by the Army Research Lab (ARL). 2) For information on the scanning patterns, refer to attached "ReadMe" file. 3) Data Period from 12/15/2015 to 02/09/2016: One data file per day (24 hours). File name of each daily data file has {boardman} as {optionalfields}. For example: lidar.z07.00.20150414.143000.boardman.csm. 4) Data Period after 02/09/2016: One scan file every 15 minutes, one stare file, and one background file every hour. File names have the following {optionalfields}: {background_boardman} for background files; {scan_boardman} for scan files; and {stare_boardman} for stare files. For example: - lidar.z07.00.20150414.143000.background_boardman - lidar.z07.00.20150414.143000.scan_boardman - lidar.z07.00.20150414.143000.stare_boardman 5) Site information: - Site: Boardman, OR - Latitude: 45.816185° N - Longitude: 119.811766° W - Elevation (meters): 112.0 Data Quality Raw data: no quality control (QC) is applied. Uncertainty The lidar measurements' uncertainty varies with the range of the measurements. Please refer to Pearson et al. (2009) for more details. Constraints 1) Because of the change of lidars, the data were downloaded in different formats. Hence, the raw data (unfiltered) primarily are in two formats: .csm and .hpl. 2) The data were downloaded every one hour or 15 minutes. Hence, the datasets are not concatenated for continuous scans. 3) A lidar offset of +195 deg (to True North) was added to the azimuthal angles from the ND scanning lidars, spanning 12/17/2015 until 02/09/2016. Later, this was corrected for the data from 02/09/2016 as the lidar aligned to True North.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset is about: (Table 8) Grain characteristics and XRD raw data of ODP Site 162-984 discrete samples. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.805230 for more information. Sediment depth is given in mcd.
https://ega-archive.org/dacs/EGAC00001002814https://ega-archive.org/dacs/EGAC00001002814
Dataset comprising raw paired RNA-seq data in fastq.gz format for 7 samples of rosette forming brain tumors
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The ont-open-data registry provides reference sequencing data from Oxford Nanopore Technologies to support, 1) Exploration of the characteristics of nanopore sequence data. 2) Assessment and reproduction of performance benchmarks 3) Development of tools and methods. The data deposited showcases DNA sequences from a representative subset of sequencing chemistries. The datasets correspond to publicly-available reference samples (e.g. Genome In A Bottle reference cell lines). Raw data are provided with metadata and scripts to describe sample and data provenance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data required to generate the example data associated with the NIfTI-MRS data standard.
The data standard can be found on Zenodo (https://doi.org/10.5281/zenodo.5084788).
The generated example data is also on Zenodo (https://doi.org/10.5281/zenodo.5085448).
Example data generation code is available on Github (https://github.com/wexeee/mrs_nifti_standard/tree/master/example_data)
Information on data sources for field analyzer manuscript calculations. This dataset is not publicly accessible because: This data was not generated by EPA, but rather used by EPA researchers to calculate basic statistics (R square and slope), as part of this literature review. It can be accessed through the following means: These two old conference proceedings are available in book volumes that can be found in libraries, with page numbers as specified below: - Argent, V.A., Southall, J.M. and D'Costa, E. (1994) Analysis of water for lead and copper using disposable sensor technology. American Water Works Association – Annual Conference, pp. 43-54, New York, New York. - Wiese, P.M. (1989) Monitoring method for lead in first-draw drinking water samples. American Water Works Association - Annual Conference and Exposition, pp. 1309-1313, Los Angeles, California. Format: Data from three tables in two old conference proceedings were used to calculate basic statistics (R square and slope): - Table 2 and 4 in Proceeding "Argent, V.A., Southall, J.M. and D'Costa, E. (1994) Analysis of water for lead and copper using disposable sensor technology. American Water Works Association – Annual Conference, pp. 43-54, New York, New York." - Table 2 in Proceeding "Wiese, P.M. (1989) Monitoring method for lead in first-draw drinking water samples. American Water Works Association - Annual Conference and Exposition, pp. 1309-1313, Los Angeles, California.". This dataset is associated with the following publication: Dore, E., D. Lytle, L. Wasserstrom, J. Swertfeger, and S. Triantafyllidou. Field Analyzers for Lead Quantification in Drinking Water Samples. CRITICAL REVIEWS IN ENVIRONMENTAL SCIENCE AND TECHNOLOGY. CRC Press LLC, Boca Raton, FL, USA, 50(20): 999-999, (2020).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
{# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains data collected as part of the Ancient Adhesives project under the European Union’s Horizon 2020 research and innovation programme Grant Agreement No. 678 804151 (Grant holder G.H.J.L.).
It is being made public to act as supplementary data for a publication and for other researchers to use this data in their own work.
The data in this dataset were collected at TUDelft in 2022 and 2023.
This dataset contains:
-Raw data of XRD of 11 archaeological objects named: SBF4; SBF5; SBF9; SBF10; SBF14; SBF15; SBF17; SBF20; SBF21; SBF23; SBF24. The filfe format is .raw
-Raw data of FTIR of 6 archaeological objects named SBF4; SBF9; SBF10; SBF20; SBF21; SBF24. The file format is .csv
-Raw data of ATR-FTIR of 1 archaeological object named SBF14. The file format is .csv
The acronym SBF stands for Steenbokfontein, a cave in the Western Cape province (South Africa) where the objects were found.
The data included in this dataset has been organized per specimen. For each specimen, more than one point was measured as indicated in the file name.
The file name includes the unique ID of the object + the analytical technique + the number of the scan (at least 2 per object). For example: SBF14_ATR_scan01
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bulk RNA-seq data (smartseq2; raw freature counts) of naive murine CD4+ T cells co-cultured with murine HSPCs (THSPC), or with murine DCs (TDC), or murine LSKs as control condition, in the presence or absence of antigen (ova,ctrl)
CNCF Raw Data for LLM Training
Description
This dataset, named cncf-raw-data-for-llm-training, consists of markdown (MD) and PDF content extracted from various project repositories within the CNCF (Cloud Native Computing Foundation) landscape. The data was collected by fetching MD and PDF files from different CNCF project repositories and converting them into JSON format. This dataset is intended as raw data for training large language models (LLMs). The dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/Kubermatic/cncf-raw-data-for-llm-training.
This entry contains raw data files from experiments performed on the Vulcan beamline at the Spallation Neutron Source at Oak Ridge National Laboratory using a pressure cell. Cylindrical granite and marble samples were subjected to confining pressures of either 0 psi or approximately 2500 psi and internal pressures of either 0 psi, 1500 psi or 2500 psi through a blind axial hole at the center of one end of the sample. The sample diameters were 1.5" and the sample lengths were 6". The blind hole was 0.25" in diameter and 3" deep. One set of experiments measured strains at points located circumferentially around the center of the sample with identical radii to determine if there was strain variability (this would not be expected for a homogeneous material based on the symmetry of loading). Another set of experiments measured load variation across the radius of the sample at a fixed axial and circumferential location. Raw neutron diffraction intensity files and experimental parameter descriptions are included.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the raw, processed data for Raman spectroscopic analysis with wavelength 785 and 532. Also, the processed spectrum as image for both wavelengths.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.