Mass spectrometry Interactive Virtual Environment (MassIVE) is a community resource developed by the NIH-funded Center for Computational Mass Spectrometry to promote the global, free exchange of mass spectrometry data. Data repository for proteomics data.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Abstract: This collection contains mass spectrometry data that were collected in the research data repository Chemotion from 2019-2023. The scientists who contributed to this collection are given as additional information (-> Other) to this dataset. The dataset was collected to enable the systematic re-use of the mass spectrometry data in Chemotion repository. Method: The dataset was gained as a subset of information from Chemotion repository after (1) filtering for mass spectrometry data with a precision of 0.001 Da, and (2) removal of duplicates Other: The collection was gained from contributions of the following scientists: Alexander B. Braun, André Jung, Angela Wandler, Arnaud Westeel, Chloé Liagre, Christoph Zippel, Cornelia Mattern, Daniel Knoll, Danny Wagner, Eduard Spuling, Florian Mohr, Georg Manolikakes, Hannes Kühner, Harald Kelm, Helena Šimek, Ilga Kristine Krimmelbein, Irina Protasova, Isabelle Wessely, Jana Barylko, Janina Beck, Jasmin Busch, Jérôme Klein, Jérôme Wagner, Julian Brückel, Ksenia Kutonova, Laura Holzhauer, Lisa Schmidt, Lukas Langer, Lutz-F. Tietze, Martin Nieger, Mirja Dinkel, Miro Hałaczkiewicz, Nicolai Rosenbaum, Nicolai Wippert, Nicole Jung, Niklas Krappel, Olaf Fuhr, Patrick Hodapp, Simon Oßwald, Simone Gräßle, Stefan Bräse, Susanne Moser, Sylvain Grosjean, Sylvia Vanderheiden-Schroen, Thomas Hurrle, Victor Larignon, Vikas Aggarwal, Yannick Matt, Yichuan Wang, Zhen Zhang TechnicalInfo: The provided zip folder includes 599 datasets including at least one *.jdx file each and one metadata file "msei_final" with metadata describing the 599 datasets TechnicalInfo: The metadata file "msei_final" contains the following metadata per dataset: sample_id, molfile, ontology term, analysis_id, instruments, authors, content, molfile_id
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The repository contains three mzML and four imzML mass spectrometry datasets,
The mzML data are compiled in a single directory 'mzML' and zipped:
The imzML mass spectrometry imaging data are zipped individually:
All these datasets are publicly available from different repositories; however, If you reuse them, please attribute the original authors!
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Proteomic investigations of Alzheimer’s and Parkinson’s disease have provided valuable insights into neurodegenerative disorders. Thus far, these investigations have largely been restricted to bottom-up approaches, hindering the degree to which one can characterize a protein’s “intact” state. Top-down proteomics (TDP) overcomes this limitation; however, it is typically limited to observing only the most abundant proteoforms and of a relatively small size. Therefore, fractionation techniques are commonly used to reduce sample complexity. Here, we investigate gas-phase fractionation through high-field asymmetric waveform ion mobility spectrometry (FAIMS) within TDP. Utilizing a high complexity sample derived from Alzheimer’s disease (AD) brain tissue, we describe how the addition of FAIMS to TDP can robustly improve the depth of proteome coverage. For example, implementation of FAIMS with external compensation voltage (CV) stepping at −50, −40, and −30 CV could more than double the mean number of non-redundant proteoforms, genes, and proteome sequence coverage compared to without FAIMS. We also found that FAIMS can influence the transmission of proteoforms and their charge envelopes based on their size. Importantly, FAIMS enabled the identification of intact amyloid beta (Aβ) proteoforms, including the aggregation-prone Aβ1–42 variant which is strongly linked to AD. Raw data and associated files have been deposited to the ProteomeXchange Consortium via the MassIVE data repository with data set identifier PXD023607.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
High-throughput tandem mass spectrometry has enabled the detection and identification of over 75% of all proteins predicted to result in translated gene products in the human genome. In fact, the galloping rate of data acquisition and sharing of mass spectrometry data has led to the current availability of many tens of terabytes of public data in thousands of human data sets. The systematic reanalysis of these public data sets has been used to build a community-scale spectral library of 2.1 million precursors for over 1 million unique sequences from over 19,000 proteins (including spectra of synthetic peptides). However, it has remained challenging to find and inspect spectra of peptides covering functional protein regions or matching novel proteins. ProteinExplorer addresses these challenges with an intuitive interface mapping tens of millions of identifications to functional sites on nearly all human proteins while maintaining provenance for every identification back to the original data set and data file. Additionally, ProteinExplorer facilitates the selection and inspection of HPP-compliant peptides whose spectra can be matched to spectra of synthetic peptides and already includes HPP-compliant evidence for 107 missing (PE2, PE3, and PE4) and 23 dubious (PE5) proteins. Finally, ProteinExplorer allows users to rate spectra and to contribute to a community library of peptides entitled PrEdict (Protein Existance dictionary) mapping to novel proteins but whose preliminary identities have not yet been fully established with community-scale false discovery rates and synthetic peptide spectra. ProteinExplorer can be now be accessed at https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp.
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Data here contain and describe an open-source structured query language (SQLite) portable database containing high resolution mass spectrometry data (MS1 and MS2) for per- and polyfluorinated alykl substances (PFAS) and associated metadata regarding their measurement techniques, quality assurance metrics, and the samples from which they were produced. These data are stored in a format adhering to the Database Infrastructure for Mass Spectrometry (DIMSpec) project. That project produces and uses databases like this one, providing a complete toolkit for non-targeted analysis. See more information about the full DIMSpec code base - as well as these data for demonstration purposes - at GitHub (https://github.com/usnistgov/dimspec) or view the full User Guide for DIMSpec (https://pages.nist.gov/dimspec/docs). Files of most interest contained here include the database file itself (dimspec_nist_pfas.sqlite) as well as an entity relationship diagram (ERD.png) and data dictionary (DIMSpec for PFAS_1.0.1.20230615_data_dictionary.json) to elucidate the database structure and assist in interpretation and use.
The Big Data Interagency Working Group (BD IWG) held a workshop, Measuring the Impact of Digital Repositories, on February 28 - March 1, 2017 in Arlington, VA. The aim of the workshop was to identify current assessment metrics, tools, and methodologies that are effective in measuring the impact of digital data repositories, and to identify the assessment issues, obstacles, and tools that require additional research and development (R&D). This workshop brought together leaders from academic, journal, government, and international data repository funders, users, and developers to discuss these issues...
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Glucocorticoids are the first-line treatment for sensorineural hearing loss, but little is known about the mechanism of their protective effect or the impact of route of administration. The recent development of hollow microneedles enables safe and reliable sampling of perilymph for proteomic analysis. Using these microneedles, we investigate the effect of intratympanic (IT) versus intraperitoneal (IP) dexamethasone administration on guinea pig perilymph proteome. Guinea pigs were treated with IT dexamethasone (n = 6), IP dexamethasone (n = 8), or untreated for control (n = 8) 6 h prior to aspiration. The round window membrane (RWM) was accessed via a postauricular approach, and hollow microneedles were used to perforate the RWM and aspirate 1 μL of perilymph. Perilymph samples were analyzed by liquid chromatography–mass spectrometry-based label-free quantitative proteomics. Mass spectrometry raw data files have been deposited in an international public repository (MassIVE proteomics repository at https://massive.ucsd.edu/) under data set # MSV000086887. In the 22 samples of perilymph analyzed, 632 proteins were detected, including the inner ear protein cochlin, a perilymph marker. Of these, 14 proteins were modulated by IP, and three proteins were modulated by IT dexamethasone. In both IP and IT dexamethasone groups, VGF nerve growth factor inducible was significantly upregulated compared to control. The remaining adjusted proteins modulate neurons, inflammation, or protein synthesis. Proteome analysis facilitated by the use of hollow microneedles shows that route of dexamethasone administration impacts changes seen in perilymph proteome. Compared to IT administration, the IP route was associated with greater changes in protein expression, including proteins involved in neuroprotection, inflammatory pathway, and protein synthesis. Our findings show that microneedles can mediate safe and effective intracochlear sampling and hold promise for inner ear diagnostics.
The provided zip folder includes 599 datasets including at least one *.jdx file each and one metadata file "msei_final" with metadata describing the 599 datasets
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and scripts accompanying the paper Standardised workflow for mass spectrometry-based single-cell proteomics data analysis using scp.
These file descriptions are also available in the README.txt file.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This study is about using mass spectrometry to measure thousands of proteins in patients' plasma to screen biomarkers for intravenous thrombolysis eligibility in ischemic stroke patients with unknown time pf onset. The dada uploaded here consists the clinical characteristics of patients and the proteins detected in their blood. IS means acute ischaemic stroke. ES means early IS with onset time within 4.5h. LS means late IS with onset time >4.5h.
A total of 411 co-fractionation mass spectrometry (CF-MS) experiments were downloaded from public proteomics repositories such as PRIDE and MassIVE, then re-analyzed with MaxQuant. This repository provides complete MaxQuant outputs from all 411 experiments.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Mitochondria are undeniably the cell powerhouse, directly affecting cell survival and fate. Growing evidence suggest that mitochondrial protein repertoire affects metabolic activity and plays an important role in determining cell proliferation/differentiation or quiescence shift. Consequently, the bioenergetic status of a cell is associated with the quality and abundance of the mitochondrial populations and proteomes. Mitochondrial morphology changes in the development of different cellular functions associated with metabolic switches. It is therefore reasonable to speculate that different cell lines do contain different mitochondrial-associated proteins, and the investigation of these pools may well represent a source for mining missing proteins (MPs). A very effective approach to increase the number of IDs through mass spectrometry consists of reducing the complexity of the biological samples by fractionation. The present study aims at investigating the mitochondrial proteome of five phenotypically different cell lines, possibly expressing some of the MPs, through an enrichment–fractionation approach at the organelle and protein level. We demonstrate a substantial increase in the proteome coverage, which, in turn, increases the likelihood of detecting low abundant proteins, often falling in the category of MPs, and resulting, for the present study, in the identification of METTL12, FAM163A, and RGS13. All MS data have been deposited to the MassIVE data repository (https://massive.ucsd.edu) with the data set identifier MSV000082409 and PXD010446.
The mass spectrometry proteomics data files have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the data set identifier PXD013243 and title \"Morphological plasticity in a sulfur-oxidizing bacterium from the SUP05 clade enhances dark carbon fixation\"
(see https://www.ebi.ac.uk/pride/archive/projects/PXD013243).
These data were published in Shah et al. (2019).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data files in .raw format are available from the Chorus repository (https://chorusproject.org) under project ID number 650 (project name: "Diatom response to allelopathy"), experiment ID number 843.
Files can be downloaded from the following URL. Note the total file size is 13.4 GB: https://chorusproject.org/anonymous/download/experiment/6729965144923166962
The data files are in Thermo-Finnigan .raw format. Vendor software can be used to view these files, and there are free viewers that can also be used (one example is PVIEW). The files can also be converted to mzXML files using MSconvert.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description This dataset is the June 2025 Data Release of Cell Maps for Artificial Intelligence (CM4AI; CM4AI.org), the Functional Genomics Grand Challenge in the NIH Bridge2AI program. This Beta release includes perturb-seq data in undifferentiated KOLF2.1J iPSCs; SEC-MS data in undifferentiated KOLF2.1J iPSCs, iPSC-derived NPCs, neurons, cardiomyocytes, and treated and untreated MDA-MB468 breast cancer cells; and IF images in MDA-MB-468 breast cancer cells in the presence and absence of chemotherapy (vorinostat and paclitaxel). External Data Links Access external data resources related to this dataset: Sequence Read Archive (SRA) Data: NCBI BioProject Mass Spectrometry Data (Human iPSCs): MassIVE Repository Mass Spectrometry Data (Human Cancer Cells): MassIVE Repository Data Governance & Ethics Human Subjects: No De-identified Samples: Yes FDA Regulated: No Data Governance Committee: Jillian Parker (jillianparker@health.ucsd.edu) Ethical Review: Vardit Ravitsky (ravitskyv@thehastingscenter.org) and Jean-Christophe Belisle-Pipon (jean-christophe_belisle-pipon@sfu.ca) Completeness These data are not yet in completed final form: Some datasets are under temporary pre-publication embargo Protein-protein interaction (SEC-MS), protein localization (IF imaging), and CRISPRi perturbSeq data interrogate sets of proteins which incompletely overlap Computed cell maps not included in this release Maintenance Plan Dataset will be regularly updated and augmented through the end of the project in November 2026 Updates on a quarterly basis Long term preservation in the University of Virginia Dataverse, supported by committed institutional funds Intended Use This dataset is intended for: AI-ready datasets to support research in functional genomics AI model training Cellular process analysis Cell architectural changes and interactions in presence of specific disease processes, treatment conditions, or genetic perturbations Limitations Researchers should be aware of inherent limitations: This is an interim release Does not contain predicted cell maps, which will be added in future releases The current release is most suitable for bioinformatics analysis of the individual datasets Requires domain expertise for meaningful analysis Prohibited Uses These laboratory data are not to be used in clinical decision-making or in any context involving patient care without appropriate regulatory oversight and approval Potential Sources of Bias Users should be aware of potential biases: Data in this release was derived from commercially available de-identified human cell lines Does not represent all biological variants which may be seen in the population at large
Centralized, standards compliant, public data repository for proteomics data, including protein and peptide identifications, post-translational modifications and supporting spectral evidence. Originally it was developed to provide a common data exchange format and repository to support proteomics literature publications. This remit has grown with PRIDE, with the hope that PRIDE will provide a reference set of tissue-based identifications for use by the community. The future development of PRIDE has become closely linked to HUPO PSI. PRIDE encourages and welcomes direct user submissions of protein and peptide identification data to be published in peer-reviewed publications. Users may Browse public datasets, use PRIDE BioMart for custom queries, or download the data directly from the FTP site. PRIDE has been developed through a collaboration of the EMBL-EBI, Ghent University in Belgium, and the University of Manchester.
Data becomes an important issue in conducting research activities. Each research institute and R & D requires data was documented from previous research, whether derived from the institution itself or other institutions. Currently, each work unit already has several databases, but there is yet one means to store a safe and reliable. Therefore, a large scientific data repository system is required. In addition to being a means of sharing data, the repository is also intended to provide access and preserve data. The repository is expected to support intergovernmental research collaboration. The various data held by Indonesian Institute of Sciences (LIPI)'s work units, especially the Life Science and Earth Science can be categorized as big data because it has a very large volume, variety, and velocity (high speed) needed to process the data. The data are still scattered in part still managed by individually and partly. Individual data management causes lack of access, data is only accessible to a limited audience. Lack of access leads to duplication of research, wasted government funds, and lack of benefits for further research. ---------------------------------------------------------------------- Data menjadi masalah yang penting dalam melakukan kegiatan penelitian. Setiap lembaga penelitian dan badan litbang memerlukan data-data yang dokumentasi dari penelitian sebelumnya, baik yang berasal dari institusi sendiri atau institusi lain. Saat ini masing-masing satuan kerja sudah memiliki beberapa pangkalan data, akan tetapi belum ada satu sarana untuk menyimpan yang aman dan handal. Oleh karena itu, perlu dibuat sistem repositori big data ilmiah. Selain sebagai sarana berbagi data, repositori juga dimaksudkan untuk menyediakan akses dan melestarikan data. Dengan repositori diharapkan akan mendukung kolaborasi penelitian antar lembaga. Berbagai macam data yang dimiliki oleh satuan kerja di lingkungan LIPI, khususnya Kedeputian Ilmu Hayati dan Kedeputian Kebumian dapat dikategorikan big data imiah karena memiliki volume yang sangat besar, variety (jenis) yang sangat beragam, dan velocity (kecepatan) tinggi yang dibutuhkan untuk memproses data tersebut. Data-data tersebut masih tersebar sebagian masih dikelola secara individu dan sebagian sudah dikelola oleh satuan kerja. Pengelolaan data secara individu menyebabkan kurangnya akses, data hanya dapat diakses oleh kalangan terbatas. Kurangnya akses menyebabkan terjadinya duplikasi penelitian, dana pemerintah terbuang, dan kurangnya manfaat untuk penelitian lebih lanjut.
dataset for mass spectrometry (MS)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
.csv files containing identified fragment ions
Mass spectrometry Interactive Virtual Environment (MassIVE) is a community resource developed by the NIH-funded Center for Computational Mass Spectrometry to promote the global, free exchange of mass spectrometry data. Data repository for proteomics data.