Within the overall project, we performed a set of microarray and chromatin-immunoprecipitation (ChIP)-chip experiments using siRNA against the (pro)renin receptor ((P)RR), stable overexpression of PLZF, the PLZF translocation inhibitor genistein and the specific V-ATPase inhibitor bafilomycin to dissect transcriptional pathways downstream of the (P)RR. In this dataset, we include the expression data obtained from stable PLZF overexpression in KELLY cells and from respective insertless controls. Two intervention samples and two control samples were analyzed. We generated the following pairwise comparisons using Chipinspector (Genomatix Software GmbH): FK_2, FK_10 versus FK_3, FK_11. ChipInspector carries out significance analysis on the single probe level. Normalized probe set level data not provided for individual Sample records. Processed data is available on Series record.
The Suomi NPP Climate Raw Data Record (C-RDR) developed at the NOAA NCDC is an intermediate product processing level (NOAA Level 1b) between a Raw Data Record (RDR) and a Sensor Data Record (SDR). The C-RDR is intended to simplify access to the raw data for the purpose of reprocessing using calibration and geolocation methods. The Visible Infrared Imaging Radiometer Suite (VIIRS) C-RDR has raw VIIRS measurements collected into time series variables, accompanied by the coefficients and tables needed to convert them to science units and calibrate them. Where applicable, metadata in this file follows the Climate and Forecast (CF) Conventions and Attribute Convention for Dataset Discovery (ACDD). Metadata attributes from the native Suomi NPP RDR and SDR file types are also included. These files have been compared with those generated using JPSS Application Development Library (ADL) applications. Product documentation and software are available for the dataset.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Pierre Auger Collaboration is releasing 10% of the data recorded since 2004 using the world's largest cosmic ray detector, the Pierre Auger Observatory, located in Argentina, in the Province of Mendoza. The release also includes 100% of weather and space-weather data collected until 31 December 2020. These data are being made available publicly with the expectation that they will be used by a wide and diverse community including professional and citizen-scientists and for educational and outreach initiatives.
Operation of the Pierre Auger Observatory, by a Collaboration of about 400 scientists from over 90 institutions in 18 countries across the world, has enabled the properties of the highest-energy cosmic rays to be determined with unprecedented precision. These cosmic rays are predominantly the nuclei of the common elements and reach the Earth from astrophysical sources. The data from the Observatory have been used to show that the highest-energy particles have an extra-galactic origin.
Cosmic rays are observed indirectly, through extensive air-showers of secondary particles produced by the interaction of the incoming cosmic ray with the atmosphere. The Surface Detector of the Observatory covers 3000 km2 and comprises an array of ~1600 particle detectors, separated by 1500 m. The low energy extension features an array of 71 stations spread apart by 750 m and covering about 27 km2. The area is overlooked by a set of telescopes that compose the Fluorescence Detector which is sensitive to the auroral-like light emitted as the air-shower develops, while the Surface Detector is sensitive to muons, electrons and photons that reach the ground.
The Open Data released here include those from these instruments. They have been subjected to the same selection and reconstruction procedures used by the Collaboration in recent publications. They amount to more than 80000 showers measured with the surface-detector arrays and more than 3000 showers recorded simultaneously by the surface and fluorescence detectors. Data are available as pseudo-raw (JSON) format and as a summary CSV file containing the reconstructed shower parameters. Simplified codes derived from the ones used for published analyses are also provided, by means of Python notebooks that have been prepared to guide the reader to an understanding of the physics results. An outreach section dedicated to the general public, and in particular to school students, is also available and includes simple tools to enjoy our data. To get more details about the Observatory and the Open Data, you can visit the dedicated website.
About the Auger Open Data
Downloadable datasets
Pseudo-raw and reconstructed data are provided in JSON format. Reconstructed data are also available in CSV format, representing a “summary” of the JSON files and containing the information that is needed for analysis. Similarly, auxiliary data are in CSV format. Format description is available on the dedicated website.
Tools
Other Auger Open Data
Disclaimer
Policy
The policy of the Auger Collaboration on Data Release and Open Access can be found here.
Contact
For any question/doubt about these data, feel free to check the contact page of our website or directly write to auger-open-data@auger.unam.mx.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Anonymized processed data from the Non-Emergency Notification Timing in Autonomous Vehicles study. This dataset depository is part of the Non-Emergency Notification Timing in Autonomous Vehicles artifact collection: 10.1184/R1/c.7894613.The goal of this study is to investigate the best moment to send AV occupants in non-emergency notifications when they are engaged in non-driving-related tasks (NDRTs).This archive contains data collected through a series of user study sessions in an autonomous vehicle simulator. On four different days, the participants experienced a simulated AV commute and completed four different non-driving-related tasks (NDRTs): Task of Choice (denoted as Scenario 1), Gaming (S2), Video Watching (S3), and Reading (S4). For the latter three scenarios, the tasks were completed on an Android tablet. The order of the latter three scenarios was also randomized.While completing NDRTs, participants were instructed to respond to an audio signal with gradually increasing volume. Participants would say "yes" or "good" if they thought they were available for non-emergency notifications upon hearing the signal; they would say "no" or "bad" otherwise. In this way, the participants provided labels for the data stream around the moment when the signal went off.For more details about the content of this dataset, please refer to README.txt. A video demonstration of the procedure can be found at 10.1184/R1/29396957. The video also presents multiple samples of our complete video stream data. Supplementary figures showcasing the data collection setup can be found at 10.1184/R1/29372027.---Due to certain terms in our consent form, at this moment, we are unable to publicly share video data containing identifiable information of the participants -- video streams containing participants' faces and their own devices are excluded from this archive. The video demonstration mentioned above contains sample video data from all available camera angles.This public dataset contains the following for each participant in each session:(note: files with names in brackets [*] are not accessible to the public; many files have "before_" or "after_" suffixes, which means the file spans [t-20s, t] or [t, t+20s], respectively; t is the time of each signal onset)all_event_log.xlsx combined event logs recording labels and detection, response, reaction, and decision time measurements (formatted, reorganized, and plotted)all_event_log.csv combined event logs recording labels and time measurementssessions_order.csv order of scenarios each participant experienceevent_log.csv event logs recording labels and time measurements of the session[*_composite_*_*.m4v] a video composite of all video feeds in each signal_* folder_good (or _bad) an empty file whose name denotes the label provided by the participant*_car.csv simulated vehicle data stream*_gaze.csv participant gaze data stream*_mems.csv participant head movement data stream (from eye tracker)[*_cams.m4v] side and rear camera video streams[*_rgb.m4v] front-facing camera video stream*_disp.m4v simulation screen recording video stream*_gaze.m4v eye tracker first-person-view video stream (This file may not be accessible for S1 for privacy reasons.)*_gaze_p.m4v eye tracker first-person-view video stream with gaze position overlay (This file may not be accessible for S1 for privacy reasons.)*_tab_touch.csv tablet touch data stream (Not in S1)*_tab_accel.csv tablet accelerometer data stream (Not in S1)
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance public use dataset has 19 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors.
Currently, CDC provides the public with three versions of COVID-19 case surveillance line-listed data: this 19 data element dataset with geography, a 12 data element public use dataset, and a 33 data element restricted access dataset.
The following apply to the public use datasets and the restricted access dataset:
Overview
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.
For more information:
NNDSS Supports the COVID-19 Response | CDC.
COVID-19 Case Reports COVID-19 case reports are routinely submitted to CDC by public health jurisdictions using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19. Current versions of these case definitions are available at: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/. All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for lab-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. States and territories continue to use this form.
Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.
To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.
CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<11 COVID-19 case records with a given values). Suppression includes low frequency combinations of case month, geographic characteristics (county and state of residence), and demographic characteristics (sex, age group, race, and ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.
COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These and other COVID-19 data are available from multiple public locations: COVID Data Tracker; United States COVID-19 Cases and Deaths by State; COVID-19 Vaccination Reporting Data Systems; and COVID-19 Death Data and Resources.
Notes:
March 1, 2022: The "COVID-19 Case Surveillance Public Use Data with Geography" will be updated on a monthly basis.
April 7, 2022: An adjustment was made to CDC’s cleaning algorithm for COVID-19 line level case notification data. An assumption in CDC's algorithm led to misclassifying deaths that were not COVID-19 related. The algorithm has since been revised, and this dataset update reflects corrected individual level information about death status for all cases collected to date.
June 25, 2024: An adjustment
This record describes components of the 'Jervis Bay Baseline Studies' project conducted by the Department of Defence, CSIRO and Australian Defence Force Academy (ADFA). The initial aims of the project were to obtain current and wind observations from Jervis bay over a six week period in order to detail the wind driven circulation and provide a data set for comparison with the numerical modelling work being simultaneously undertaken. However, after this initial experiment it became clear that there were significant currents in the bay that are not simply related to direct wind forcing. Therefore, alternative mechanisms for driving the flow had to be investigated, through the measurement programs and data analysis, as well as through numerical modelling. The result was a series of approximately six separate experiments aiming to define the water circulation around the bay and through the bay entrance, to gain an understanding of the processes that drive the currents, and to investigate the influence of stratification on the nature of the currents.
This record doesn't describe one of these six experiments per se, but the data collected from a meteorological station at Governor Head. This Steedman EMS-16 meteorological station housed a R.M. Young anemometer and YSI barometer to measure wind and atmospheric pressure respectively. The site at Governor Head was several hundred metres inland of 50m high cliffs and was 63m above sea-level.
Wind data was also collected from an anemometer located at Huskisson, on the west shore of Jervis Bay. This anemometer was operated at the time by the NSW Public Works Department.
As this is a parent record, no data is available to download. A pdf outlining the structure and hierarchy of metadata records relating to this project is available through this record. Also provided is a pdf of a working paper that describes the operating and data processing procedures that relate to the meteorological station. There are 36 subsiduary records that directly relate to this parent, through which the data is provided (see hierarchical tree).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed raw data from fiber photometry recordings of different subtypes (Vglut2+, Calb1+, Anxa1+ and Aldh1a1+ as well as DAT+) SNc dopamine neurons labelled with GCaMP6f, as used in Azcorra et al. Nat Neuro 2023. This dataset has been pre-processed to calculate DF/F from the raw data (see below for code and raw data), which are then normalized from 0 to 1 (un-normalized DF/F data can be recovered using the 'norm' value included in the dataset). This dataset also includes metadata for each recording (recording location, mouse sex...).
The code used to generate this pre-processed data from raw data is available on GitHub (https://github.com/DombeckLab/Azcorra2023/releases/tag/Azcorra2023) and Zenodo (DOI: 10.5281/zenodo.7872052, https://zenodo.org/record/7872052). The original raw data has been deposited on Zenodo (DOI: 10.5281/zenodo.7871634, https://zenodo.org/record/7871634). The code necessary to analyze this data and generate the figures shown in the manuscript is is found in that same GitHub repository as the pre-processing code above.
Please note, this dataset has been superseded by a newer version (see below). Users should not use this version except in rare cases (e.g., when reproducing previous studies that used this version). USCRN "Processed" Data (labeled as "uscrn-processed"): are interpreted values and derived geophysical parameters with other quality indicators processed from raw data (both Datalogger files and/or Raw Data from GOES and NOAAPort) by the USCRN Team. Climate variable types include air temperature, precipitation, soil moisture, soil temperature, surface temperature, wetness, global solar radiation, relative humidity, and wind at 1.5 m above the ground. Many additional engineering variables are also available. These data have been decoded, quality-flagged, and processed into level 1 hourly data (the only applied quality control is rounding some values as they enter the database), and includes additional calculated values such as precipitation (5-minute and hourly), hourly maximum temperature, hourly minimum temperature, average temperature (5-minute and hourly), soil moisture (volumetric water content, 5-minute values at the 5 cm depth and and hourly values at all depths) for all dielectric values in range, layer average soil moisture (5 minute and hourly), and layer average soil temperature (5 minute and hourly). It is the general practice of USCRN to not calculate derived variables if the input data to these calculations are flagged. These data records are versioned based on the processing methods and algorithms used for the derivations (versions are noted within the data netCDF file), and data are updated when the higher quality raw data become available from stations' datalogger storage (Datalogger Files).
This data set is captured from a robot workcell that is performing activities representative of several manufacturing operations. The workcell contains two, 6-degree-of-freedom robot manipulators where one robot is performing material handling operations (e.g., transport parts into and out of a specific work space) while the other robot is performing a simulated precision operation (e.g., the robot touching the center of a part with a tool tip that leaves a mark on the part). This precision operation is intended to represent a precise manufacturing operation (e.g., welding, machining). The goal of this data set is to provide robot level and process level measurements of the workcell operating in nominal parameters. There are no known equipment or process degradations in the workcell. The material handling robot will perform pick and place operations, including moving simulated parts from an input area to in-process work fixtures. Once parts are placed in/on the work fixtures, the second robot will interact with the part in a specified precise manner. In this specific instance, the second robot has a pen mounted to its tool flange and is drawing the NIST logo on a surface of the part. When the precision operation is completed, the material handling robot will then move the completed part to an output. This suite of data includes process data and performance data, including timestamps. Timestamps are recorded at predefined state changes and events on the PLC and robot controllers, respectively. Each robot controller and the PLC have their own internal clocks and, due to hardware limitations, the timestamps recorded on each device are relative to their own internal clocks. All timestamp data collected on the PLC is available for real-time calculations and is recorded. The timestamps collected on the robots are only available as recorded data for post-processing and analysis. The timestamps collected on the PLC correspond to 14 part state changes throughout the processing of a part. Timestamps are recorded when PLC-monitored triggers are activated by internal processing (PLC trigger origin) or after the PLC receives an input from a robot controller (robot trigger origin). Records generated from PLC-originated triggers include parts entering the work cell, assignment of robot tasks, and parts leaving the work cell. PLC-originating triggers are activated by either internal algorithms or sensors which are monitored directly in the PLC Inputs/Outputs (I/O). Records generated from a robot-originated trigger include when a robot begins operating on a part, when the task operation is complete, and when the robot has physically cleared the fixture area and is ready for a new task assignment. Robot-originating triggers are activated by PLC I/O. Process data collected in the workcell are the variable pieces of process information. This includes the input location (single option in the initial configuration presented in this paper), the output location (single option in the initial configuration presented in this paper), the work fixture location, the part number counted from startup, and the part type (task number for drawing robot). Additional information on the context of the workcell operations and the captured data can be found in the attached files, which includes a README.txt, along with several noted publications. Disclaimer: Certain commercial entities, equipment, or materials may be identified or referenced in this data, or its supporting materials, in order to illustrate a point or concept. Such identification or reference is not intended to imply recommendation or endorsement by NIST; nor does it imply that the entities, materials, equipment or data are necessarily the best available for the purpose. The user assumes any and all risk arising from use of this dataset.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.
Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:
Council of State and Territorial Epidemiologists (ymaws.com).
Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (total case counts) as the present dataset; however, NCHS Death Counts are based on death certificates that use information reported by physicians, medical examiners, or coroners in the cause-of-death section of each certificate. Data from each of these pages are considered provisional (not complete and pending verification) and are therefore subject to change. Counts from previous weeks are continually revised as more records are received and processed.
Number of Jurisdictions Reporting There are currently 60 public health jurisdictions reporting cases of COVID-19. This includes the 50 states, the District of Columbia, New York City, the U.S. territories of American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, Puerto Rico, and the U.S Virgin Islands as well as three independent countries in compacts of free association with the United States, Federated States of Micronesia, Republic of the Marshall Islands, and Republic of Palau. New York State’s reported case and death counts do not include New York City’s counts as they separately report nationally notifiable conditions to CDC.
CDC COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths, available by state and by county. These and other data on COVID-19 are available from multiple public locations, such as:
https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html
https://www.cdc.gov/covid-data-tracker/index.html
https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html
https://www.cdc.gov/coronavirus/2019-ncov/php/open-america/surveillance-data-analytics.html
Additional COVID-19 public use datasets, include line-level (patient-level) data, are available at: https://data.cdc.gov/browse?tags=covid-19.
Archived Data Notes:
November 3, 2022: Due to a reporting cadence issue, case rates for Missouri counties are calculated based on 11 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 3, 2022, instead of the customary 7 days’ worth of data.
November 10, 2022: Due to a reporting cadence change, case rates for Alabama counties are calculated based on 13 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 10, 2022, instead of the customary 7 days’ worth of data.
November 10, 2022: Per the request of the jurisdiction, cases and deaths among non-residents have been removed from all Hawaii county totals throughout the entire time series. Cumulative case and death counts reported by CDC will no longer match Hawaii’s COVID-19 Dashboard, which still includes non-resident cases and deaths.
November 17, 2022: Two new columns, weekly historic cases and weekly historic deaths, were added to this dataset on November 17, 2022. These columns reflect case and death counts that were reported that week but were historical in nature and not reflective of the current burden within the jurisdiction. These historical cases and deaths are not included in the new weekly case and new weekly death columns; however, they are reflected in the cumulative totals provided for each jurisdiction. These data are used to account for artificial increases in case and death totals due to batched reporting of historical data.
December 1, 2022: Due to cadence changes over the Thanksgiving holiday, case rates for all Ohio counties are reported as 0 in the data released on December 1, 2022.
January 5, 2023: Due to North Carolina’s holiday reporting cadence, aggregate case and death data will contain 14 days’ worth of data instead of the customary 7 days. As a result, case and death metrics will appear higher than expected in the January 5, 2023, weekly release.
January 12, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0. As a result, case and death metrics will appear lower than expected in the January 12, 2023, weekly release.
January 19, 2023: Due to a reporting cadence issue, Mississippi’s aggregate case and death data will be calculated based on 14 days’ worth of data instead of the customary 7 days in the January 19, 2023, weekly release.
January 26, 2023: Due to a reporting backlog of historic COVID-19 cases, case rates for two Michigan counties (Livingston and Washtenaw) were higher than expected in the January 19, 2023 weekly release.
January 26, 2023: Due to a backlog of historic COVID-19 cases being reported this week, aggregate case and death counts in Charlotte County and Sarasota County, Florida, will appear higher than expected in the January 26, 2023 weekly release.
January 26, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0 in the weekly release posted on January 26, 2023.
February 2, 2023: As of the data collection deadline, CDC observed an abnormally large increase in aggregate COVID-19 cases and deaths reported for Washington State. In response, totals for new cases and new deaths released on February 2, 2023, have been displayed as zero at the state level until the issue is addressed with state officials. CDC is working with state officials to address the issue.
February 2, 2023: Due to a decrease reported in cumulative case counts by Wyoming, case rates will be reported as 0 in the February 2, 2023, weekly release. CDC is working with state officials to verify the data submitted.
February 16, 2023: Due to data processing delays, Utah’s aggregate case and death data will be reported as 0 in the weekly release posted on February 16, 2023. As a result, case and death metrics will appear lower than expected and should be interpreted with caution.
February 16, 2023: Due to a reporting cadence change, Maine’s
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Summary
This metadata record provides details of the data supporting the claims of the related manuscript “A tumor microenvironment specific gene expression signature predicts chemotherapy resistance in colorectal cancer patients”.
The related study aimed to determine whether used tumor microenvironment (TME) specific gene signature to identify colorectal cancer (CRC) subtypes with distinctive clinical relevance was possible.
Data access
The data analysed during the related study were downloaded from public databases including Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) and The Cancer Genome Atlas (TCGA; TCGA CRC datasets available from the Synapse repository at: https://www.synapse.org/#!Synapse:syn2623706/files/). For a list of accession IDs for the analysed data, see Supplementary Table S1 of the manuscript, also included as part of this metadata record. The Renji RNA-seq data is available from GEO: https://identifiers.org/geo:GSE158559.
The output data of the related study are included with this data record, and are as follows:- Table S1 to S10 - supplementary tables 1 to 10 for the related manuscript- Cetuximab_GSE5851.PRJEB34338.combined.Rdata - two combined CRC Cetuximab treated gene expression matrix- combined_five_GEObatch_GSE14333_GSE17536_GSE17537_GSE33113_GSE37892.Rdata - five combined CRC gene expression matrix- FOLFOX_GSE19860_GSE28702_GSE69675.Rdata - three combined CRC FOLFOX treated gene expression matrix- FOLFOX_GSE104645_GSE72970.Rdata - two combined CRC FOLFOX or FOLFIRI treated gene expression matrix- GSE39395.expMatrix.Rdata - GSE39395 gene expression matrix- GSE39396.expMatrix.Rdata - GSE39396 gene expression matrix- GSE39582_after_ComBat.Rdata - GSE39582 gene expression matrix- GSE62080_exp_pdata.Rdata - GSE62080 gene expression matrix- GSE72056.melanoma.sfm.signature.rds - scRNA melanoma processed data- GSE75688.BRCA.sfm.signature.rds - scRNA breast cancer processed data- GSE81861.sfm.signature.rds - scRNA CRC processed data- GSE103322.head-neck.sfm.signature.rds - scRNA head and neck processed data- TCGA.CRC.expMatrix.Rdata - TCGA CRC gene expression matrix- TCGA.CRC.microbiome.abundance.Rdata - TCGA CRC gut microbiome abundance
This dataset version has been superseded by version 2. This data set provides a Climate Data Record (CDR) of passive microwave sea ice concentration based on the recommendations from the National Research Council (NRC) (2004). It is produced from gridded brightness temperatures from the Defense Meteorological Satellite Program (DMSP) series of Special Sensor Microwave Imager (SSM/I) passive microwave radiometers: F-8, F-11, and F-13. The NOAA/NSIDC CDR sea ice concentrations provide a consistent, daily time series of sea ice concentrations from 09 July 1987 through 31 December 2007. The NOAA/NSIDC CDR sea ice concentrations are an estimate of the fraction of ocean area covered by sea ice that is produced by combining concentration estimates created using two algorithms developed at the NASA Goddard Space Flight Center (GSFC): the NASA Team algorithm (Cavalieri et al., 1984) and the Bootstrap algorithm (Comiso, 1986). The individual algorithms are processed and combined at NSIDC using brightness temperature data from Remote Sensing Systems, Inc. (RSS). The data are gridded on the NSIDC polar stereographic grid with 25 x 25 km grid cells and are available in netCDF file format. Each daily file includes four different sea ice concentration variables: a variable with the primary CDR sea ice concentrations created by NSIDC and three variables with sea ice concentrations created by Goddard. The three Goddard-processed sea ice concentrations are Goddard NASA Team algorithm sea ice concentrations, Goddard Bootstrap sea ice concentrations, and a merged version of the Goddard NASA Team/Bootstrap algorithm sea ice concentrations. Variables containing standard deviation, quality flags, and projection information are also included in the netCDF file. The three Goddard-produced sea ice concentrations are included in the data files for a number of reasons. The merged Goddard NASA Team/Bootstrap sea ice concentrations are an ancillary data set that is analogous to the NSIDC CDR data but that adds late 1978 through mid 1986 data to the record. A different instrument, the Scanning Multichannel Microwave Radiometer (SMMR), was the source for the brightness temperatures from this period. Sea ice concentrations from the extended period are not part of the primary NSIDC-produced CDR record because complete documentation of the SMMR brightness temperature processing method is not available. The separate Goddard NASA Team and Bootstrap sea ice concentrations are provided for reference. The data are available via FTP.
This dataset contains information about Bluetooth devices detected by our Bluetooth travel sensors. Each record contains a detected device’s anonymized Media Access Control (MAC) address along with the time and location the device was detected. These records alone are not traffic data but can be post-processed to measure the movement of detected devices through the roadway network How does the City of Austin use the Bluetooth travel sensor data? The data enables transportation engineers to better understand short and long-term trends in Austin’s traffic patterns, supporting decisions about systems planning and traffic signal timing. What information does the data contain? The sensor data is available in three datasets: Individual Address Records ( https://data.austintexas.gov/dataset/Bluetooth-Travel-Sensors-Individual-Addresses/qnpj-zrb9/data ) Each row in this dataset represents a Bluetooth device that was detected by one of our sensors. Each record contains a detected device’s anonymized Media Access Control (MAC) address along with the time and location the device was detected. These records alone are not traffic data but can be post-processed to measure the movement of detected devices through the roadway network Individual Traffic Matches ( https://data.austintexas.gov/dataset/Bluetooth-Travel-Sensors-Individual-Traffic-Matche/x44q-icha/data ) Each row in this dataset represents one Bluetooth enabled device that detected at two locations in the roadway network. Each record contains a detected device’s anonymized Media Access Control (MAC) address along with contain information about origin and destination points at which the device was detected, as well the time, date, and distance traveled. Traffic Summary Records ( https://data.austintexas.gov/dataset/Bluetooth-Travel-Sensors-Match-Summary-Records/v7zg-5jg9 ) The traffic summary records contain aggregate travel time and speed summaries based on the individual traffic match records. Each row in the dataset summarizes average travel time and speed along a sensor-equipped roadway segment in 15 minute intervals. Does this data contain personally identifiable information? No. The Media Access Control (MAC) addresses in these datasets are randomly generated.
This clean dataset is a refined version of our company datasets, consisting of 35M+ data records.
It’s an excellent data solution for companies with limited data engineering capabilities and those who want to reduce their time to value. You get filtered, cleaned, unified, and standardized B2B data. After cleaning, this data is also enriched by leveraging a carefully instructed large language model (LLM).
AI-powered data enrichment offers more accurate information in key data fields, such as company descriptions. It also produces over 20 additional data points that are very valuable to B2B businesses. Enhancing and highlighting the most important information in web data contributes to quicker time to value, making data processing much faster and easier.
For your convenience, you can choose from multiple data formats (Parquet, JSON, JSONL, or CSV) and select suitable delivery frequency (quarterly, monthly, or weekly).
Coresignal is a leading public business data provider in the web data sphere with an extensive focus on firmographic data and public employee profiles. More than 3B data records in different categories enable companies to build data-driven products and generate actionable insights. Coresignal is exceptional in terms of data freshness, with 890M+ records updated monthly for unprecedented accuracy and relevance.
https://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdfhttps://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdf
The Fundamental Data Record (FDR) for Atmospheric Composition UVN v.1.0 dataset is a cross-instrument Level-1 product [ATMOS_L1B] generated in 2023 and resulting from the ESA FDR4ATMOS project. The FDR contains selected Earth Observation Level 1b parameters (irradiance/reflectance) from the nadir-looking measurements of the ERS-2 GOME and Envisat SCIAMACHY missions for the period ranging from 1995 to 2012. The data record offers harmonised cross-calibrated spectra with focus on spectral windows in the Ultraviolet-Visible-Near Infrared regions for the retrieval of critical atmospheric constituents like ozone (O3), sulphur dioxide (SO2), nitrogen dioxide (NO2) column densities, alongside cloud parameters. The FDR4ATMOS products should be regarded as experimental due to the innovative approach and the current use of a limited-sized test dataset to investigate the impact of harmonization on the Level 2 target species, specifically SO2, O3 and NO2. Presently, this analysis is being carried out within follow-on activities. The FDR4ATMOS V1 is currently being extended to include the MetOp GOME-2 series. Product format For many aspects, the FDR product has improved compared to the existing individual mission datasets: GOME solar irradiances are harmonised using a validated SCIAMACHY solar reference spectrum, solving the problem of the fast-changing etalon present in the original GOME Level 1b data; Reflectances for both GOME and SCIAMACHY are provided in the FDR product. GOME reflectances are harmonised to degradation-corrected SCIAMACHY values, using collocated data from the CEOS PIC sites; SCIAMACHY data are scaled to the lowest integration time within the spectral band using high-frequency PMD measurements from the same wavelength range. This simplifies the use of the SCIAMACHY spectra which were split in a complex cluster structure (with own integration time) in the original Level 1b data; The harmonization process applied mitigates the viewing angle dependency observed in the UV spectral region for GOME data; Uncertainties are provided. Each FDR product provides, within the same file, irradiance/reflectance data for UV-VIS-NIR special regions across all orbits on a single day, including therein information from the individual ERS-2 GOME and Envisat SCIAMACHY measurements. FDR has been generated in two formats: Level 1A and Level 1B targeting expert users and nominal applications respectively. The Level 1A [ATMOS_L1A] data include additional parameters such as harmonisation factors, PMD, and polarisation data extracted from the original mission Level 1 products. The ATMOS_L1A dataset is not part of the nominal dissemination to users. In case of specific requirements, please contact EOHelp. Please refer to the README file for essential guidance before using the data. All the new products are conveniently formatted in NetCDF. Free standard tools, such as Panoply, can be used to read NetCDF data. Panoply is sourced and updated by external entities. For further details, please consult our Terms and Conditions page. Uncertainty characterisation One of the main aspects of the project was the characterization of Level 1 uncertainties for both instruments, based on metrological best practices. The following documents are provided: General guidance on a metrological approach to Fundamental Data Records (FDR) Uncertainty Characterisation document Effect tables NetCDF files containing example uncertainty propagation analysis and spectral error correlation matrices for SCIAMACHY (Atlantic and Mauretania scene for 2003 and 2010) and GOME (Atlantic scene for 2003) reflectance_uncertainty_example_FDR4ATMOS_GOME.nc reflectance_uncertainty_example_FDR4ATMOS_SCIA.nc Known Issues Non-monotonous wavelength axis for SCIAMACHY in FDR data version 1.0 In the SCIAMACHY OBSERVATION group of the atmospheric FDR v1.0 dataset (DOI: 10.5270/ESA-852456e), the wavelength axis (lambda variable) is not monotonically increasing. This issue affects all spectral channels (UV, VIS, NIR) in the SCIAMACHY group, while GOME OBSERVATION data remain unaffected. The root cause of the issue lies in the incorrect indexing of the lambda variable during the NetCDF writing process. Notably, the wavelength values themselves are calculated correctly within the processing chain. Temporary Workaround The wavelength axis is correct in the first record of each product. As a workaround, users can extract the wavelength axis from the first record and apply it to all subsequent measurements within the same product. The first record can be retrieved by setting the first two indices (time and scanline) to 0 (assuming counting of array indices starts at 0). Note that this process must be repeated separately for each spectral range (UV, VIS, NIR) and every daily product. Since the wavelength axis of SCIAMACHY is highly stable over time, using the first record introduces no expected impact on retrieval results. Python pseudo-code example: lambda_...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Summary
This metadata record provides details of the data supporting the claims of the related manuscript: “FOXA1 and adaptive response determinants to HER2 targeted therapy in TBCRC 036”.
The related study aimed to determine the global alterations in gene enhancers and transcriptional changes to identify factors involved in the adaptive response to HER2 inhibition. In parallel, it analysed the in vivo human adaptive molecular responses to HER2 targeting in a window-of-opportunity clinical trial using both RNAseq and a chemical proteomics method (MIB/MS) to assess the functional kinome.
Type of data: mass spectrometry proteomics data; normalised patient RNA sequencing data; cell line RNA sequencing data; cell line ChIPseq data
Subject of data: Homo sapiens; Eukaryotic cell lines
Recruitment: Eligible women included those with newly diagnosed Stage I-IV HER2+ breast cancer scheduled to undergo definitive surgery (either lumpectomy or mastectomy). Stage I-IIIc patients could not be candidates for a therapeutic neoadjuvant treatment. Study subjects provided informed written consent that included details of the nontherapeutic nature of the trial.
Trial registration number: https://clinicaltrials.gov/ct2/show/NCT01875666
Data access
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier https://identifiers.org/pride.project:PXD021865.
Normalized patient RNAseq data (https://identifiers.org/geo:GSE161743), cell line RNAseq (https://identifiers.org/geo:GSE160001 and https://identifiers.org/geo:GSE160001), and cell line ChIPseq (https://identifiers.org/geo:GSE160667) are all part of the SuperSeries https://identifiers.org/geo:GSE160670 available through the Gene Expression Omnibus.
Processed and normalized data are provided as supplemental materials associated with the article on the journal website, and also attached to this data record in the Excel spreadsheets called Supplementary Data 1-10 and the PDF called Supplementary material file.PDF. Accompanying Supplementary Information and Supplementary Data files contain relevant data used to produce the included figures and are available with this article. A detailed list of which data files underlie which figures and tables in the related article is included in the file ‘Angus_et_al_2021_underlying_data_files_list.xlsx’, which is shared with this data record.
The data supporting Figure 3c is in the GraphPad Prism file called ‘siGrowth’, which is not shared publicly as it is in a non-open format, but it can be made available upon reasonable request to the corresponding author.
Corresponding author(s) for this study
Gary L. Johnson, PhD, Department of Pharmacology, 4079 Genetic Medicine Building, University of North Carolina School of Medicine, Chapel Hill, NC 27599. Email: glj@med.unc.edu. Phone: 919-843-3106.
Study approval
Approved by the UNC Office of Human Research Ethics and conducted in accordance with the Declaration of Helsinki. IRB# 13-1826
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains a collection of real-world industrial screw driving datasets, designed to support research in manufacturing process monitoring, anomaly detection, and quality control. Each dataset represents different aspects and challenges of automated screw driving operations, with a focus on natural process variations and degradation patterns.
Scenario name | Number of work pieces used in the experiments | Repetitions (screw cylces) per workpiece | Individual screws per workpiece | Total number of observations | Number of unique classes | Purpose |
S01_thread-degradation | 100 | 25 | 2 | 5.000 | 1 | Investigation of thread degradation through repeated fastening |
S02_surface-friction | 250 | 25 | 2 | 12.500 | 8 | Surface friction effects on screw driving operations |
S03_error-collection-1 | 1 | 2 | >20 | |||
S04_error-collection-2 | 2.500 | 1 | 2 | 5.000 | 25 |
The datasets were collected from operational industrial environments, specifically from automated screw driving stations used in manufacturing. Each scenario investigates specific mechanical phenomena that can occur during industrial screw driving operations:
1. S01_thread-degradation
2. S02_surface-friction
3. S03_screw-error-collection-1 (recorded but unpublished)
4. S04_screw-error-collection-2 (recorded but unpublished)
5. S05_upper-workpiece-manipulations (recorded but unpublished)
6. S06_lower-workpiece-manipulations (recorded but unpublished)
Additional scenarios may be added to this collection as they become available.
Each dataset follows a standardized structure:
These datasets are suitable for various research purposes:
These datasets are provided under an open-access license to support research and development in manufacturing analytics. When using any of these datasets, please cite the corresponding publication as detailed in each dataset's README file.
We recommend using our library PyScrew to load and prepare the data. However, the the datasets can be processed using standard JSON and CSV processing libraries. Common data analysis and machine learning frameworks may be used for the analysis. The .tar file provided all information required for each scenario.
Each dataset includes:
For questions, issues, or collaboration interests regarding these datasets, please:
These datasets were collected and prepared from:
The research was supported by:
Within the overall project, we performed a set of microarrays to validate RNAseq data (submitted to EBI: PRJEB4463). In this data set, we compare the expression data of song nuclei to the optical tectrum dissected from adult canaries housed at long day cycles to identify nuclei specific genes. 18 total S. canaria samples were analyzed, 6 HVC samples, 5 RA samles and 7 Entopallium samples. The differential expression was analyzed using the group-wise exhaustive analysis with False Discovery Rate set to zero and 10-significant probe minimum coverage, HVc/RA compared to entopallium samples. ChipInspector carries out significance analysis on the single probe level (directly generated from the CEL files). Thus, normalized probe set level data for individual Sample records are not available. Processed data files containing transcripts and the fold changes are available on Series record.
Processed data to be used in analyses related to the sRNA landscape.
1) small RNA processed data from stem trichomes: 2020-12-17_11-23_results_stem_trichomes.tar.gz
2) small RNA processed data from bald stem, leaf primordium and leaf:
xxxx === to be added === xxx
3) mRNA-seq processed data (raw and scaled counts) from different tissues (stem trichomes, bald stem, leaf, leaf primordium): 20201117_snakemake_messenger_rnaseq_trichomes_and_other_tissues.tar.gz
This file was obtained from the following original mRNA-seq fastq files:
The pipeline used was Snakemake RNA-seq release 0.3.4
The file contains:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Please contact Iris Groen (i.i.a.groen@uva.nl, https://orcid.org/0000-0002-5536-6128) for more information.
Please see the following papers for more details on the data collection and preprocessing:
Groen IIA, Piantoni G, Montenegro S, Flinker A, Devore S, Devinsky O, Doyle W, Dugan P, Friedman D, Ramsey N, Petridou N, Winawer JA (2022) Temporal dynamics of neural responses in human visual cortex. The Journal of Neuroscience 42(40):7562-7580 (https://doi.org/10.1523/JNEUROSCI.1812-21.2022)
Yuasa K, Groen IIA, Piantoni G, Montenegro S, Flinker A, Devore S, Devinsky O, Doyle W, Dugan P, Friedman D, Ramsey N, Petridou N, Winawer JA. Precise Spatial Tuning of Visually Driven Alpha Oscillations in Human Visual Cortex. eLife12:RP90387 https://doi.org/10.7554/eLife.90387.1
Brands AM, Devore S, Devinsky O, Doyle W, Flinker A, Friedman D, Dugan P, Winawer JA, Groen IIA (2024). Temporal dynamics of short-term neural adaptation in human visual cortex. https://doi.org/10.1101/2023.09.13.557378
Processed data and model fits reported in Groen et al., (2022) are available in derivatives/Groenetal2022TemporalDynamicsECoG as matlab .mat files. Matlab code to load, process and plot these data (including 3D renderings of the participant's surface reconstructions and electrode positions) is available in https://github.com/WinawerLab/ECoG_utils and https://github.com/irisgroen/temporalECoG. These repositories have dependencies on other Matlab toolboxes (e.g., FieldTrip). See instructions on Github for relevant links and guidelines.
Processed data and model fits reported in Yuasa et al., (2023) are available in the Github repositories described in the paper.
Processed data and model fits reported in Brands et al., (2024) are available in derivatives/Brandsetal2024TemporalAdaptationECoGCategories as python .py files. Python code to process and analyze these data is available in the Github repositories described in the paper.
Visual ECoG dataset
Data were collected between 2017-2020. Exact recording dates have been scrubbed for anonymization purposes.
Participants sub-p01 to sub-p11 viewed grayscale visual pattern stimuli that were varied in temporal or spatial properties. Participans sub-p11 to sub-p14 additionally saw color images of different image classes (faces, bodies, buildings, objects, scenes, and scrambled) that were varied in temporal properties. See 'Independent Variables' below for more details.
In all tasks, participants were instructed to fixate a cross or point in the center of the screen and monitor it for a color change, i.e. to perform a stimulus-orthogonal task (see the task-specific _events.json files, e.g., task-prf_events.json, for further details).
The data consists of cortical iEEG recordings in 14 epilepsy patients in response to visual stimulation. Patients were implanted with standard clinical surface (grid) and depth electrodes. Two patients were additionally implanted with a high-density research grid. In addition to the ieeg recordings, pre-implantation MRI T1 scans are provided for the purpose of localizing electrodes. Participants performed a varying number of tasks and runs.
The data are divided in 6 different sets of stimulus types or events:
Participant-, task- and run-specific stimuli are provided in the /stimuli folder as matlab .mat files.
The main BIDS folder contains the raw voltage data, split up in individual task runs. The /derivatives/ECoGCAR folder contains common-average-referenced version of the data. The /derivatives/ECoGBroadband folder contains time-varying broadband responses estimated by band-pass filtering the common-average-referenced voltage data and taking the average power envelope. The /derivatives/ECoGPreprocessed folder contains epoched trials used in Brands et al., (2024). The /derivatives/freesurfer folder contains surface reconstructions of each participant's T1, along with retinotopic atlas files. The /derivatives/Groen2022TemporalDynamicsECoG contains preprocessed data and model fits that can be used to reproduce the results reported in Groen et al., (2022). The /derivatives/Brands2024TemporalAdaptationECoG contains preprocessed data and model fits that can be used to reproduce the results reported in Brands et al., (2024).
Data quality and number of trials per subjects varies considerably across patients, for various reasons.
First, for each recording session, attempts were made to optimize the environment for running visual experiments; e.g. room illumination was stabilized as much as possible by closing blinds when available, the visual display was calibrated (for most patients), and interference from medical staff or visitors was minimized. However, it was not possible to equate this with great precision across patients and sessions/runs.
Second, implantations were determined based on clinical needs and electrode locations therefore vary across participants. The strength and robustness of the neural responses varies greatly with the electrode location (e.g. early vs higher-level visual cortex), as well as with uncontrolled factors such as how well the electrode made contact with the cortex and whether it was primarily situated on grey matter (surface/grid electrodes) or could be located in white matter (some depth electrodes). Electrodes that were marked as containing epileptic activity by clinicians, or that did not have good signal based on visual inspection of the raw data, are marked as 'bad' in the channels.tsv files.
Third, patients varied greatly in their cognitive abilities and mental/medical state, which affected their ability to follow task instructions, e.g. to remain alert and fixation. Some patients were able to perform repeated runs of multiple tasks across multiple sessions, while others only managed to do a few runs.
All patients included in this dataset have sufficiently good responses in some electrodes/tasks as judged by Groen et al., (2022) and Brands et al., (2024). However, when using this dataset to address further research questions, it is advisable to set stringent requirements on electrode and trial selection. See Groen et al., (2022) and associated code repository for an example preprocessing pipeline that selected for robust visual responses to temporally- and contrast-varying stimuli.
All participants were intractable epilepsy patients who were undergoing ECoG for the purpose of monitoring seizures. Participants were included if their implantation covered parts of visual cortex and if they consented to participate in research.
Data were collected in a clinical setting, i.e. at bedside in the patient's hospital room. Information about iEEG recording apparatus is provided the meta data for each patient. Information about the visual stimulation equipment and behavioral response recordings are provided in Groen et al., (2022), Yuasa et al., (2023) and Brands et al., (2024).
Data were collected at NYU University Langone Hospital (New York, USA) or at University Medical Center Utrecht (The Netherlands).
Stimulus files are missing for a few runs of sub-02. These are marked as N/A in the associated event files.
Further participant-specific notes:
For sub-03 and sub-04 the spatial pattern and temporal pattern stimuli are combined in the soc task runs, for the remaining participants these are split across the spatialpattern and temporalpattern task runs.
The pRF task from sub-04 has different prf parameters (bar duration and gap).
The first two runs of the pRF task from sub-05 are not of good quality (participant repeatedly broke fixation). In addition, the triggers in all pRF runs from sub-05 are not correct due to a stimulus coding problem and will need to be re-interpolated if one wishes to use these data.
Participants sub-10 and sub-11 have high density grids in addition to clinical grids.
Note that all stimuli and stimulus parameters can be found in the participant-specific stimulus *.mat files.
Within the overall project, we performed a set of microarray and chromatin-immunoprecipitation (ChIP)-chip experiments using siRNA against the (pro)renin receptor ((P)RR), stable overexpression of PLZF, the PLZF translocation inhibitor genistein and the specific V-ATPase inhibitor bafilomycin to dissect transcriptional pathways downstream of the (P)RR. In this dataset, we include the expression data obtained from stable PLZF overexpression in KELLY cells and from respective insertless controls. Two intervention samples and two control samples were analyzed. We generated the following pairwise comparisons using Chipinspector (Genomatix Software GmbH): FK_2, FK_10 versus FK_3, FK_11. ChipInspector carries out significance analysis on the single probe level. Normalized probe set level data not provided for individual Sample records. Processed data is available on Series record.