30 datasets found
  1. P

    PPI Dataset

    • paperswithcode.com
    Updated Oct 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William L. Hamilton; Rex Ying; Jure Leskovec (2017). PPI Dataset [Dataset]. https://paperswithcode.com/dataset/ppi
    Explore at:
    Dataset updated
    Oct 29, 2017
    Authors
    William L. Hamilton; Rex Ying; Jure Leskovec
    Description

    protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to a different human tissue [41]. positional gene sets are used, motif gene sets and immunological signatures as features and gene ontology sets as labels (121 in total), collected from the Molecular Signatures Database [34]. The average graph contains 2373 nodes, with an average degree of 28.8.

  2. Data for all PPI analysis

    • figshare.com
    application/x-gzip
    Updated Feb 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Downing (2023). Data for all PPI analysis [Dataset]. http://doi.org/10.6084/m9.figshare.19672527.v5
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Feb 10, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Tim Downing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    4363 samples' CSV files for all proteins with gene name, plasmid status and the total number of PPIs per sample

  3. Z

    Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hynek, Karel (2024). CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7409923
    Explore at:
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    Lukačovič, Andrej
    Šiška, Pavel
    Luxemburk, Jan
    Čejka, Tomáš
    Hynek, Karel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:

    W-2022-44

    Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45

    Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46

    Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47

    Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22

    Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M

    Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:

    ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

    Link to other CESNET datasets

    https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:

    @article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }

  4. P

    STRING Dataset

    • paperswithcode.com
    Updated Oct 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi Tsafou; Michael Kuhn; Peer Bork; Lars Juhl Jensen; Christian von Mering (2021). STRING Dataset [Dataset]. https://paperswithcode.com/dataset/string
    Explore at:
    Dataset updated
    Oct 11, 2021
    Authors
    Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi Tsafou; Michael Kuhn; Peer Bork; Lars Juhl Jensen; Christian von Mering
    Description

    STRING is a collection of protein-protein interaction (PPI) networks.

  5. Z

    Data from: CESNET-TLS-Year22: A year-spanning TLS network traffic dataset...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luxemburk, Jan (2025). CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10608606
    Explore at:
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Luxemburk, Jan
    Pavel, Šiška
    Čejka, Tomáš
    Pešek, Jaroslav
    Hynek, Karel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo.

    The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.

    Data description The dataset consists of network flows describing encrypted TLS communications. Flows are extended with packet sequences, histograms, and fields extracted from the TLS ClientHello message, which is transmitted in the first packet of the TLS connection handshake. The most important extracted handshake field is the SNI domain, which is used for ground-truth labeling.

    Packet Sequences Sequences of packet sizes, directions, and inter-packet times are standard data input for traffic analysis. For packet sizes, we consider the payload size after transport headers (TCP headers for the TLS case). We omit packets with no TCP payload, for example ACKs, because zero-payload packets are related to the transport layer internals rather than services’ behavior. Packet directions are encoded as ±1, where +1 means a packet sent from client to server, and -1 is a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate a response. Packet sequences have a maximum length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction; in other words, each client request and server response pair counts as one roundtrip.

    Flow statistics Each data record also includes standard flow statistics, representing aggregated information about the entire bidirectional connection. The fields are the number of transmitted bytes and packets in both directions, the duration of the flow, and packet histograms. The packet histograms include binned counts (not limited to the first 30 packets) of packet sizes and inter-packet times in both directions. There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes (More information in the PHISTS plugin documentation). Moreover, each flow has its end reason---either it ended with the TCP connection termination (FIN packets), was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons.

    Dataset structure The dataset is organized per weeks and individual days. The flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the total number of saved flows and the number of flows per service. There are also files aggregating flow counts for each week (stats-week.json) and for the entire dataset (stats-dataset.json). The following list describes flow data fields in CSV files:

    ID: Unique identifier

    SRC_IP: Source IP address

    DST_IP: Destination IP address

    DST_ASN: Destination Autonomous System number

    SRC_PORT: Source port

    DST_PORT: Destination port

    PROTOCOL: Transport protocol

    FLAG_CWR: Presence of the CWR flag

    FLAG_CWR_REV: Presence of the CWR flag in the reverse direction

    FLAG_ECE: Presence of the ECE flag

    FLAG_ECE_REV: Presence of the ECE flag in the reverse direction

    FLAG_URG: Presence of the URG flag

    FLAG_URG_REV: Presence of the URG flag in the reverse direction

    FLAG_ACK: Presence of the ACK flag

    FLAG_ACK_REV: Presence of the ACK flag in the reverse direction

    FLAG_PSH: Presence of the PSH flag

    FLAG_PSH_REV: Presence of the PSH flag in the reverse direction

    FLAG_RST: Presence of the RST flag

    FLAG_RST_REV: Presence of the RST flag in the reverse direction

    FLAG_SYN: Presence of the SYN flag

    FLAG_SYN_REV: Presence of the SYN flag in the reverse direction

    FLAG_FIN: Presence of the FIN flag

    FLAG_FIN_REV: Presence of the FIN flag in the reverse direction

    TLS_SNI: Server Name Indication domain

    TLS_JA3: JA3 fingerprint of TLS client

    TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff

    TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff

    DURATION: Duration of the flow in seconds

    BYTES: Number of transmitted bytes from client to server

    BYTES_REV: Number of transmitted bytes from server to client

    PACKETS: Number of packets transmitted from client to server

    PACKETS_REV: Number of packets transmitted from server to client

    PPI: Packet sequence in the format: [[inter-packet times], [packet directions], [packet sizes], [push flags]]

    PPI_LEN: Number of packets in the PPI sequence

    PPI_DURATION: Duration of the PPI sequence in seconds

    PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence

    PHIST_SRC_SIZES: Histogram of packet sizes from client to server

    PHIST_DST_SIZES: Histogram of packet sizes from server to client

    PHIST_SRC_IPT: Histogram of inter-packet times from client to server

    PHIST_DST_IPT: Histogram of inter-packet times from server to client

    APP: Web service label

    CATEGORY: Service category

    FLOW_ENDREASON_IDLE: Flow was terminated because it was idle

    FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout

    FLOW_ENDREASON_END: Flow ended with the TCP connection termination

    FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

  6. f

    Data from: D-SLIMMER: Domain–SLiM Interaction Motifs Miner for Sequence...

    • acs.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willy Hugo; See-Kiong Ng; Wing-Kin Sung (2023). D-SLIMMER: Domain–SLiM Interaction Motifs Miner for Sequence Based Protein–Protein Interaction Data [Dataset]. http://doi.org/10.1021/pr200312e.s002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Willy Hugo; See-Kiong Ng; Wing-Kin Sung
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Many biologically important protein–protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.e., a protein domain) in one protein and a SLiM in the other protein. Our results on a benchmark of 113 experimentally verified reference SLiMs showed that D-SLIMMER outperformed existing methods notably for discovering domain-SLiMs interaction motifs. To illustrate the significance of the SLiMs detected, we highlighted two SLiMs discovered from the PPI data by D-SLIMMER that are variants of the known ELM SLiM, as well as a literature-backed SLiM that is yet to be listed in the reference databases. We also presented a novel SLiM predicted by D-SLIMMER that was strongly supported by existing biological literatures. These examples showed that D-SLIMMER is able to find SLiMs that are biologically relevant.

  7. d

    Data from: Lidar - LMCT - WTX WindTracer, Gordon Ridge - Raw Data

    • catalog.data.gov
    • data.openei.org
    • +1more
    Updated Apr 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wind Energy Technologies Office (WETO) (2022). Lidar - LMCT - WTX WindTracer, Gordon Ridge - Raw Data [Dataset]. https://catalog.data.gov/dataset/lidar-esrl-windcube-200s-wasco-airport-processed-data
    Explore at:
    Dataset updated
    Apr 26, 2022
    Dataset provided by
    Wind Energy Technologies Office (WETO)
    Description

    Overview Long-range scanning Doppler lidar located on Gordon Ridge. The WindTracer provides high-resolution, long-range lidar data for use in the WFIP2 program. Data Details The system is configured to take data in three different modes. All three modes take 15 minutes to complete and are started at 00, 15, 30, and 45 minutes after the hour. The first nine minutes of the period are spent performing two high-resolution, long-range Plan Position Indicator (PPI) scans at 0.0 and -1.0 degree elevation angles (tilts). These data have file names annotated with HiResPPI noted in the "optional fields" of the file name; for example: lidar.z09.00.20150801.150000.HiResPPI.prd. The next six minutes are spent performing higher altitude PPI scans and Range Height Indicator (RHI) scans. The PPI scans are completed at 6.0- and 30.0-degree elevations, and the RHI scans are completed from below the horizon (down into valleys, as able), up to 40 degrees elevation at 010-, 100-, 190-, and 280-degree azimuths. These files are annotated with PPI-RHI noted in the optional fields of the file name; for example: lidar.z09.00.20150801.150900.PPI-RHI.prd *The last minute is spent measuring a high-altitude vertical wind profile. Generally, this dataset will include data from near ground level up to the top of the planetary boundary layer (PBL), and higher altitude data when high-level cirrus or other clouds are present. The Velocity Azimuth Display (VAD) is measured using six lines of sight at an elevation angle of 75 degrees at azimuth angles of 000, 060, 120, 180, 240, and 300 degrees from True North. The files are annotated with VAD in the optional fields of the file name; for example: lidar.z09.00.20150801.151400.VAD.prd. LMCT does have a data format document that can be provided to users who need programming access to the data. This document is proprietary information but can be supplied to anyone after signing a non-disclosure agreement (NDA). To initiate the NDA process, please contact Keith Barr at keith.barr@lmco.com. The data are not proprietary, only the manual describing the data format. Data Quality Lockheed Martin Coherent Technologies (LMCT) has implemented and refined data quality analysis over the last 14 years, and this installation uses standard data-quality processing procedures. Generally, filtered data products can be accepted as fully data qualified. Secondary processing, such as wind vector analysis, should be used with some caution as the data-quality filters still are "young" and incorrect values can be encountered. Uncertainty Uncertainty in the radial wind measurements (the system's base measurement) varies slightly with range. For most measurements, accuracy of the filtered radial wind measurements have been shown to be within 0.5 m/s with accuracy better than 0.25 m/s not uncommon for ranges less than 10 km. Constraints Doppler lidar is dependent on aerosol loading in the atmosphere, and the signal can be significantly attenuated in precipitation and fog. These weather situations can reduce range performance significantly, and, in heavy rain or thick fog, range performance can be reduced to zero. Long-range performance depends on adequate aerosol loading to provide enough backscattered laser radiation so that a measurement can be made.

  8. Dataset for estimate fine root biomass in ecotone forests on the eastern of...

    • search.datacite.org
    • data.mendeley.com
    Updated Apr 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reinaldo Imbrozio Barbosa (2020). Dataset for estimate fine root biomass in ecotone forests on the eastern of Maracá Island, northern Brazilian Amazonia [Dataset]. http://doi.org/10.17632/2j5x4h9t5b
    Explore at:
    Dataset updated
    Apr 13, 2020
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Mendeley
    Authors
    Reinaldo Imbrozio Barbosa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset provides information on fine root biomass (2-20mm ; 1m in depth) associated to edaphic factors (soil texture and fertility) in ecotone forests located on the eastern of Maracá Island, a fluvial island integrated to the Maracá Ecological Station, state of Roraima, northern Brazilian Amazonia. This area represents an ecotone zone of the southern region of the Guyana Shield, which is dominated by mosaics of ombrophilous and seasonal forests in contact with savanna areas. Our sampling took into consideration 30 permanent plots located in the PPBio (Biodiversity Research Program) 25-km2 research grid installed in the eastern of Maracá Island. The fieldwork was carried in July and October 2015 when we collected two soil cores (sub-samples) 1 m in depth for each plot. Each soil core was composed by 10 sections of 10 cm in length (000-010 cm; … ; 090-100 cm). All fine root (2-20 mm) found in each section was classified by diameter categories (2-5 mm; 5-10mm; 10-20mm), dried in electric stove (100±3oC) and weighted (0,0001g). Soil analysis were performed for each depth section taking into account a composite sample derived from both soil cores. We used the soil analysis methodology adopted by Embrapa (Embrapa. 2009. Manual de análises químicas de solos, plantas e fertilizantes. Embrapa Informação Tecnológica, 2. ed. rev. ampl. Brasília-DF. 627 p). Our dataset is presented in two files: (i) soil_analysis - sampling units codes (plotID) and their geographical reference (UTM, SAD69, Zone 20), associated with altitude (m a.s.l.), drainage (well/poor), section depth (cm), and soil analysis (mean of the edaphic variables for each section - texture, fertility, soil bulk density); (ii) fine_root – fine root biomass (g) for each soil section by root diameter category (2-5 mm; 5-10mm; 10-20mm), sampling date and sub-sample number. This dataset was supported by institutional project PPI/INPA 015/122 (Ecologia e manejo de savannas e florestas de Roraima). The Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq - Brazil) provided a fellowships for R.I. Barbosa (CNPq 304204/2015-3) and M.T. Nascimento (CNPq 308352/2015-7). L.C.S. Carvalho was supported by a fellowship from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES/PNPD). Instituto Chico Mendes de Conservação da Biodiversidade (ICMBio) provided authorization for the study. This dataset is also available on the DataONE website (https://search.dataone.org/view/PPBioAmOc.135.9), as well as the soil bulk density data (https://search.dataone.org/view/PPBioAmOc.114.5).

  9. Magnetic Field Boundaries in Cassini Plasma Spectrometer Data

    • zenodo.org
    • eprints.soton.ac.uk
    png, zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caitriona Jackman; Caitriona Jackman; Michelle Thomson; Michelle Thomson; Michele Dougherty; Michele Dougherty; Ameya Daigavane; Ameya Daigavane (2024). Magnetic Field Boundaries in Cassini Plasma Spectrometer Data [Dataset]. http://doi.org/10.5281/zenodo.5004160
    Explore at:
    zip, pngAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Caitriona Jackman; Caitriona Jackman; Michelle Thomson; Michelle Thomson; Michele Dougherty; Michele Dougherty; Ameya Daigavane; Ameya Daigavane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of:

    • get_files.sh - a bash script to download ELS data (.DAT and .LBL) files.
    • crossing_events_urls.txt - a text file containing URLs for each of the ELS
      data files. get_files.sh reads this file directly.
    • crossing_events.txt - a text file containing the list of crossing events
      with associated data, obtained by processing Table S1 in [1].
    • labels.zip - a zip file containing directory with events for each DAT file,
      in YAML format.
    • plotter.py - a plotting script in Python.
    • requirements.txt - a file indicating dependencies for the plotting script.

    Data

    This dataset spans 1259 observations from CAPS ELS, each with a .LBL file and a
    .DAT file. Together, these take 128 GB of space. To download these files, do:

    ./get_files.sh

    in a Bash shell/terminal. The files are stored in the 'data/' folder (created if
    not present) in the current directory.

    ELS data files are obtained from NASA's Planetary Data System at
    https://pds-ppi.igpp.ucla.edu/search/view/?f=yes&id=pds://PPI/CO-E_J_S_SW-CAPS-3-CALIBRATED-V1.0/DATA/CALIBRATED

    For help in understanding the ELS data files, see the CAPS User Guide at
    https://pds-ppi.igpp.ucla.edu/ditdos/download?id=pds://PPI/COCAPS_1SAT/DOCUMENT/CAPS_USER_GUIDE/CAPS_PDS_USER_GUIDE_V1_00.PDF.

    Labels

    Unzipping the labels.zip file will create a labels/ folder in the current
    directory, requiring 28MB of space.

    Within labels/:

    • bs/ - includes bow shock crossing events.
    • mp/ - includes magnetopause crossing events.
    • dg/ - includes known data-gap events. See [1].
    • sc/ - includes unreliable data events. See [1].
    • valid/ - union of all files in bs/ and mp/.
    • all/ - union of all files in bs/, mp/, dg/ and sc/.

    Within the labels/bs/ and labels/mp/ folders:

    • in/ - crossing events with the transition direction as inward.
    • out/ - crossing events with the transition direction as outward.
    • all/ - union of all files in in/ and out/.

    Each label file is a YAML file containing a 'change_points' field. The entries
    under this field each indicate the time of a transition event. The other fields
    ('bimodality' and 'negative_ions') are not relevant for this dataset.

    Plotting the Data (and Labels)

    To help visualize the data and labels, we supply a plotting script
    plotter.py, in Python (version 2.7). First, install dependencies with:

    pip install -r requirements.txt

    and then run:

    ./plotter.py -h

    to see the available options. Example usage (after downloading data and
    unzipping labels):

    ./plotter.py data/ELS_200418018_V01.DAT -l labels/mp/all/ELS_200418018_V01.yaml --interpolated -f max_filter -fsize 100 --title "An Observation from CAPS ELS" 

    will open up a new window with a plot of the data.
    To directly save to a file, use the '-o savefilename' option. For example,

    ./plotter.py data/ELS_200418018_V01.DAT -l labels/mp/all/ELS_200418018_V01.yaml --interpolated -f max_filter -fsize 100 --title "An Observation from CAPS ELS" -o ELS_200418018_V01.png
    

    References

    [1] Jackman, C. M., Thomsen, M. F., & Dougherty, M. K. (2019). Survey of
    Saturn's magnetopause and bow shock positions over the entire Cassini mission:
    Boundary statistical properties and exploration of associated upstream
    conditions. Journal of Geophysical Research: Space Physics, 124, 8865– 8883.
    https://doi.org/10.1029/2019JA026628

    The original table of magnetopause and bow shock crossing events can be found at
    https://agupubs.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1029%2F2019JA026628&file=jgra55251-sup-0001-Table_SI-S01.txt

    Grants

    • NASA Contract through JPL with South West Research Institute. Grant Number: 1243218
    • Science and Technology Facilities Council. Grant Number: ST/L004399/1
    • NASA. Grant Number: 1243218
    • Diamond Jubilee Fellowship
    • STFC. Grant Number: ST/L004399/1
  10. f

    S1 Raw data -

    • plos.figshare.com
    zip
    Updated Dec 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zongyong Zhang; Zongqing Zheng; Wenwei Luo; Jiebo Li; Jiushan Liao; Fuxiang Chen; Dengliang Wang; Yuanxiang Lin (2024). S1 Raw data - [Dataset]. http://doi.org/10.1371/journal.pone.0310108.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Zongyong Zhang; Zongqing Zheng; Wenwei Luo; Jiebo Li; Jiushan Liao; Fuxiang Chen; Dengliang Wang; Yuanxiang Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Ischemic stroke (IS) is a leading cause of death and disability worldwide. Screening for marker genes in IS is crucial for its early diagnosis and improvement in clinical outcomes. In the study, the gene expression profiles in the GSE22255 and GSE37587 datasets were extracted from the public database Gene Expression Omnibus. Weighted gene co‑expression network analysis (WGCNA) was used to investigate the gene sets that were related to ubiquitination. A total of 33 ubiquitination-related differentially expressed genes (DEGs) were identified using “limma (version 3.50.0)”. Gene set enrichment analysis (GSEA) and gene set variation analysis (GSVA) analysis enriched multiple pathways that were closely related to IS. The correlations between the HALLMARK signaling pathways and DGEs were analyzed. Receiver operating characteristic analysis was used to validate the diagnostic value of the key genes. Among them, 16 genes were identified as hub genes. Single-sample GSEA was performed to evaluate the infiltration status of immune cells in IS. To understand the potential molecular mechanisms of the hub genes in IS, we constructed RBP-mRNA and mRNA–miRNA–lncRNA interaction networks. Additionally, we used the GeneMANIA database to create a PPI network for the signature genes to investigate their functions. As a result, there was a significant difference in the overall infiltration of immune cells between the IS and control groups. Among the 28 types of immune cells, the degree of infiltration of seven types was significantly different between the two groups (p

  11. n

    Lists of Magnetopause and Bow Shock Crossings of Mercury by MESSENGER...

    • data.niaid.nih.gov
    Updated Aug 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sun, Weijie (2023). Lists of Magnetopause and Bow Shock Crossings of Mercury by MESSENGER Spacecraft [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8298646
    Explore at:
    Dataset updated
    Aug 30, 2023
    Dataset authored and provided by
    Sun, Weijie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset titled “Lists of Magnetopause and Bow Shock Crossings of Mercury by MESSENGER Spacecraft” employs the measurements from the MESSENGER spacecraft’s Magnetometer (MAG) and Fast Imaging Plasma Spectrometer (FPIS) instruments to identify magnetopause and bow shock crossings during MESSENGER's orbit of Mercury. MESSENGER's data orbiting Mercury were collected between 23-03-2011 and 30-04-2015 and are available from the Planetary Data System’s Planetary Plasma Interactions (PDS/PPI) Node at https://pds-ppi.igpp.ucla.edu.

    The dataset includes four lists:

    a, Bow_Shock_Out_Time_Duration_public_version_WeijieSun_20230829.txt

    b, Bow_Shock_In_Time_Duration_public_version_WeijieSun_20230829.txt

    c, MagPause_In_Time_Duration_public_version_WeijieSun_20230829.txt

    d, MagPause_Out_Time_Duration_public_version_WeijieSun_20230829.txt

    Here are examples for the time in the list:

    Example A

    2011 03 23 15 39 10.5 2011 03 23 16 24 02.4 BSO m

    This entry represents multiple bow shock crossings. The first six columns indicate the time of the first boundary crossing, while the next six columns indicate the time of the last boundary crossing. “BSO” stands for outbound crossing of the bow shock, and “m” indicates that this is a multiple bow shock crossing made by MESSENGER. Only the first and last boundaries were selected out, we did not identify the boundary crossings in between.

    Example B

    2011 03 25 13 04 24.2 2011 03 25 13 04 24.0 BSI s

    This entry represents a single bow shock crossing. The first six columns and the next six columns are identical, indicating that this is a single event. “BSI” stands for inbound crossing of the bow shock, and “s” indicates that this is a single bow shock crossing made by MESSENGER.

    The dataset does not include magnetopause and bow shock crossings during the following time intervals:

    a. From 03:02 to 20:00 on 05-04-2011

    b. From 24-05-2011 to 03-06-2011

    c. From 17:50 to 22:53 on 16-04-2012

    d. From 09-06-2012 to 13-06-2012

    e. From 07:30 on 08-01-2013 to 16:00 on 09-01-2013

    f. From 07:55 to 18:33 on 28-02-2013

    g. From 14:22 to 17:38 on 26-12-2014

    The current version is updated on 29 August 2023.

    This work was supported by NASA Discovery Data Analysis Program (DDAP) Grant #80NSSC22K1061 (PI Weijie Sun).

  12. Z

    TEAMx-PC22 (TEAMx pre-campaign 2022) - ACINN Doppler wind lidar data sets...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gohm, Alexander (2023). TEAMx-PC22 (TEAMx pre-campaign 2022) - ACINN Doppler wind lidar data sets (SL88, SLXR142) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7912691
    Explore at:
    Dataset updated
    Jun 20, 2023
    Dataset authored and provided by
    Gohm, Alexander
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT

    The data sets found here were collected with ACINN's Doppler wind lidars SL88 and SLXR142 in Innsbruck, Austria, in summer 2022 in the framework of the TEAMx pre-campaign 2022 (TEAMx-PC22). The aim of TEAMx-PC22 was to test new instruments, new instrument configurations and new measurement sites to support the planning of the main TEAMx observational campaign (TOC) in 2024/2025. More details about TEAMx can be found at http://www.teamx-programme.org as well as in Serafin et al. (2020) and in Rotach et al. (2022).

    DATA SET DESCRIPTION

    1. Spatial coverage and locations

    Measurements with the SL88 and SLXR142 lidar were collected during TEAMx-PC22 in Innsbruck, Austria, at the Campus Innrain of the University of Innsbruck. More specifically, the SLXR142 lidar was located on the rooftop of one of the university buildings (Bruno-Sander-Haus) at Innrain 52f. The SL88 lidar was located in the forecourt of the Campus Innrain, the so-called GEIWI-Forum, next to the Bruno-Sander-Haus. The exact lidar locations are:

    SL88: 47.264083°N / 11.384986°E / 575 m MSL

    SLXR142: 47.26431°N / 11.38529°E / 613 m MSL

    1. Temporal coverage

    The TEAMx-PC22 lasted from mid-May 2022 to early October 2022. However, the SL88 data set contains a shorter period from 11 August to 02 October 2022 (1 Hz data, vertical stares). The SLXR142 data set covers an extended period from 01 May to 31 October 2022 (VAD products, 10-min averages) as this lidar was operated in a semi-permanent mode.

    1. Instrument details

    General

    Measurements were taken with two scanning Doppler wind lidars, model Stream Line (SL88) and Stream Line XR (SLXR142), manufactured by HALO Photonics. The SL88 and SLXR142 are part of the Innsbruck Atmospheric Observatory (IAO; Karl et al. 2020). Available here are vertical profiles of radial velocity and backscatter data based on vertical stares at 1 Hz for the SL88 lidar and vertical profiles of horizontal winds (10-min averages) derived from plan position indicator (PPI) scans by applying the VAD method for the SLXR142 lidar. PPI scans were performed as continuous motion scans (CSM mode) at an azimuth angle of 70°. For continuous motion scans, the scanner moves continuously (changing its azimuth angle) while data is being acquired.

    Data correction

    No corrections were applied to the data (level0 data).

    1. Data file structure

    File format

    Provided are data in netCDF format. File names contain date and time information in UTC. The following wildcard characters are used in the file examples below: yyyy - year; mm - month, dd - day; HH - hour, MM - minute, SS - second. NetCDF data files are zipped together into the following zip files.

    Zip files

    SL88.zip contains netCDF files of SL88 data structured into subdirectories (one subdirectory for each month, yyyymm, and one for each day, yyyymmdd).

    SLXR142.zip contains netCDF files of SLXR142 data structured into subdirectories (one subdirectory for each month, yyyymm).

    NetCDF files for uncorrected SL88 data

    Stare_88_yyyymmdd_HH_l0.nc contains vertical stare measurements aggregated together in one netCDF file for each hour (uncorrected level0 data).

    NetCDF files for SLXR142 data products

    yyyymmdd.nc contains vertical profiles of the horizontal wind vector derived from PPI scans by applying the VAD technique. Each vertical profile is based on several PPI scans conducted at an elevation angle of 70° within 10 minutes. Hence, each profile represents a 10-min average. Profiles are aggregated together for each day in a separate netCDF file.

    1. Contact

    Contact alexander.gohm(at)uibk.ac.at for any questions regarding the data set.

    1. References

    Karl, T., A. Gohm, M.W. Rotach, H.C. Ward, M. Graus, A. Cede, G. Wohlfahrt, A. Hammerle, M. Haid, M. Tiefengraber, C. Lamprecht, J. Vergeiner, A. Kreuter, J. Wagner, M. Staudinger, 2020: Studying urban climate and air quality in the Alps: The Innsbruck Atmospheric Observatory. Bulletin of the American Meteorological Society, 101, E488–E507, https://doi.org/10.1175/bams-d-19-0270.1

    Serafin, S., M. W. Rotach, M. Arpagaus, I. Colfescu, J. Cuxart, S. F. J. De Wekker, M. Evans, V. Grubišić, N. Kalthoff, T. Karl, D. J. Kirshbaum, M. Lehner, S. Mobbs, A. Paci, E. Palazzi, A. Raudzens Bailey, J. Schmidli, G. Wohlfahrt, B. Zardi, 2020: Multi-scale transport and exchange processes in the atmosphere over mountains: Programme and experiment. Innsbruck University Press. https://doi.org/10.15203/99106-003-1

    Rotach, M. W., S. Serafin, H. C. Ward, M. Arpagaus, I. Colfescu, J. Cuxart, S. F. J. D. Wekker, V. Grubišic, N. Kalthoff, T. Karl, D. J. Kirshbaum, M. Lehner, S. Mobbs, A. Paci, E. Palazzi, A. Bailey, J. Schmidli, C. Wittmann, G. Wohlfahrt, D. Zardi, 2022: A collaborative effort to better understand, measure, and model atmospheric exchange processes over mountains. Bulletin of the American Meteorological Society, 103, E1282–E1295. https://doi.org/10.1175/bams-d-21-0232.1

  13. f

    Tests of difference beween degree distributions of essential and...

    • plos.figshare.com
    • figshare.com
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Ivanic; Xueping Yu; Anders Wallqvist; Jaques Reifman (2023). Tests of difference beween degree distributions of essential and nonessential proteins in raw and high-confidence yeast PPI data sets. [Dataset]. http://doi.org/10.1371/journal.pone.0005815.t004
    Explore at:
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Joseph Ivanic; Xueping Yu; Anders Wallqvist; Jaques Reifman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    aAverage degree of essential proteins.bAverage degree of nonessential proteins.cP-value for two-sample Kolmogorov-Smirnov test for difference between degree distributions of essential and nonessential proteins. The symbol “

  14. f

    Data_Sheet_2_Revealing the Interactions Between Diabetes, Diabetes-Related...

    • frontiersin.figshare.com
    • figshare.com
    docx
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lijuan Zhu; Ju Xiang; Qiuling Wang; Ailan Wang; Chao Li; Geng Tian; Huajun Zhang; Size Chen (2023). Data_Sheet_2_Revealing the Interactions Between Diabetes, Diabetes-Related Diseases, and Cancers Based on the Network Connectivity of Their Related Genes.docx [Dataset]. http://doi.org/10.3389/fgene.2020.617136.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Lijuan Zhu; Ju Xiang; Qiuling Wang; Ailan Wang; Chao Li; Geng Tian; Huajun Zhang; Size Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Diabetes-related diseases (DRDs), especially cancers pose a big threat to public health. Although people have explored pathological pathways of a few common DRDs, there is a lack of systematic studies on important biological processes (BPs) connecting diabetes and its related diseases/cancers. We have proposed and compared 10 protein–protein interaction (PPI)-based computational methods to study the connections between diabetes and 254 diseases, among which a method called DIconnectivity_eDMN performs the best in the sense that it infers a disease rank (according to its relation with diabetes) most consistent with that by literature mining. DIconnectivity_eDMN takes diabetes-related genes, other disease-related genes, a PPI network, and genes in BPs as input. It first maps genes in a BP into the PPI network to construct a BP-related subnetwork, which is expanded (in the whole PPI network) by a random walk with restart (RWR) process to generate a so-called expanded modularized network (eMN). Since the numbers of known disease genes are not high, an RWR process is also performed to generate an expanded disease-related gene list. For each eMN and disease, the expanded diabetes-related genes and disease-related genes are mapped onto the eMN. The association between diabetes and the disease is measured by the reachability of their genes on all eMNs, in which the reachability is estimated by a method similar to the Kolmogorov–Smirnov (KS) test. DIconnectivity_eDMN achieves an area under receiver operating characteristic curve (AUC) of 0.71 for predicting both Type 1 DRDs and Type 2 DRDs. In addition, DIconnectivity_eDMN reveals important BPs connecting diabetes and DRDs. For example, “respiratory system development” and “regulation of mRNA metabolic process” are critical in associating Type 1 diabetes (T1D) and many Type 1 DRDs. It is also found that the average proportion of diabetes-related genes interacting with DRDs is higher than that of non-DRDs.

  15. Aerospace and electronic cost indices time series (MM19)

    • ons.gov.uk
    • cy.ons.gov.uk
    csdb, csv, xlsx
    Updated Oct 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2020). Aerospace and electronic cost indices time series (MM19) [Dataset]. https://www.ons.gov.uk/economy/inflationandpriceindices/datasets/aerospaceandelectronicscostindices
    Explore at:
    csdb, xlsx, csvAvailable download formats
    Dataset updated
    Oct 21, 2020
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Cost indices (purchase of materials and fuels, average weekly earnings, general expenses and combined costs) relating to four aerospace and electronics industries.

  16. f

    Demographic characteristics of sample patients (N = 42,972).

    • figshare.com
    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Herng-Ching Lin; Sudha Xirasagar; Shiu-Dong Chung; Chung-Chien Huang; Ming-Chieh Tsai; Chao-Hung Chen (2023). Demographic characteristics of sample patients (N = 42,972). [Dataset]. http://doi.org/10.1371/journal.pone.0172436.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Herng-Ching Lin; Sudha Xirasagar; Shiu-Dong Chung; Chung-Chien Huang; Ming-Chieh Tsai; Chao-Hung Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Demographic characteristics of sample patients (N = 42,972).

  17. f

    Table2_Identification of mitophagy-related biomarkers in human osteoporosis...

    • frontiersin.figshare.com
    xlsx
    Updated Jan 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu Su; Gangying Yu; Dongchen Li; Yao Lu; Cheng Ren; Yibo Xu; Yanling Yang; Kun Zhang; Teng Ma; Zhong Li (2024). Table2_Identification of mitophagy-related biomarkers in human osteoporosis based on a machine learning model.XLSX [Dataset]. http://doi.org/10.3389/fphys.2023.1289976.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 8, 2024
    Dataset provided by
    Frontiers
    Authors
    Yu Su; Gangying Yu; Dongchen Li; Yao Lu; Cheng Ren; Yibo Xu; Yanling Yang; Kun Zhang; Teng Ma; Zhong Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Osteoporosis (OP) is a chronic bone metabolic disease and a serious global public health problem. Several studies have shown that mitophagy plays an important role in bone metabolism disorders; however, its role in osteoporosis remains unclear.Methods: The Gene Expression Omnibus (GEO) database was used to download GSE56815, a dataset containing low and high BMD, and differentially expressed genes (DEGs) were analyzed. Mitochondrial autophagy-related genes (MRG) were downloaded from the existing literature, and highly correlated MRG were screened by bioinformatics methods. The results from both were taken as differentially expressed (DE)-MRG, and Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were performed. Protein-protein interaction network (PPI) analysis, support vector machine recursive feature elimination (SVM-RFE), and Boruta method were used to identify DE-MRG. A receiver operating characteristic curve (ROC) was drawn, a nomogram model was constructed to determine its diagnostic value, and a variety of bioinformatics methods were used to verify the relationship between these related genes and OP, including GO and KEGG analysis, IP pathway analysis, and single-sample Gene Set Enrichment Analysis (ssGSEA). In addition, a hub gene-related network was constructed and potential drugs for the treatment of OP were predicted. Finally, the specific genes were verified by real-time quantitative polymerase chain reaction (RT-qPCR).Results: In total, 548 DEGs were identified in the GSE56815 dataset. The weighted gene co-expression network analysis(WGCNA) identified 2291 key module genes, and 91 DE-MRG were obtained by combining the two. The PPI network revealed that the target gene for AKT1 interacted with most proteins. Three MRG (NELFB, SFSWAP, and MAP3K3) were identified as hub genes, with areas under the curve (AUC) 0.75, 0.71, and 0.70, respectively. The nomogram model has high diagnostic value. GO and KEGG analysis showed that ribosome pathway and cellular ribosome pathway may be the pathways regulating the progression of OP. IPA showed that MAP3K3 was associated with six pathways, including GNRH Signaling. The ssGSEA indicated that NELFB was highly correlated with iDCs (cor = −0.390, p < 0.001). The regulatory network showed a complex relationship between miRNA, transcription factor(TF) and hub genes. In addition, 4 drugs such as vinclozolin were predicted to be potential therapeutic drugs for OP. In RT-qPCR verification, the hub gene NELFB was consistent with the results of bioinformatics analysis.Conclusion: Mitophagy plays an important role in the development of osteoporosis. The identification of three mitophagy-related genes may contribute to the early diagnosis, mechanism research and treatment of OP.

  18. PPI-R subscale correlations total sample (n = 227).

    • plos.figshare.com
    xls
    Updated Jun 17, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karolina Sörman; Gustav Nilsonne; Katarina Howner; Sandra Tamm; Shilan Caman; Hui-Xin Wang; Martin Ingvar; John F. Edens; Petter Gustavsson; Scott O Lilienfeld; Predrag Petrovic; Håkan Fischer; Marianne Kristiansson (2016). PPI-R subscale correlations total sample (n = 227). [Dataset]. http://doi.org/10.1371/journal.pone.0156570.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2016
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Karolina Sörman; Gustav Nilsonne; Katarina Howner; Sandra Tamm; Shilan Caman; Hui-Xin Wang; Martin Ingvar; John F. Edens; Petter Gustavsson; Scott O Lilienfeld; Predrag Petrovic; Håkan Fischer; Marianne Kristiansson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PPI-R subscale correlations total sample (n = 227).

  19. f

    Data from: An Optimized Miniaturized Filter-Aided Sample Preparation Method...

    • figshare.com
    • acs.figshare.com
    xlsx
    Updated Jul 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu He; Yang Li; Lili Zhao; Guojin Ying; Gang Lu; Lihua Zhang; Zhenbin Zhang (2024). An Optimized Miniaturized Filter-Aided Sample Preparation Method for Sensitive Cross-Linking Mass Spectrometry Analysis of Microscale Samples [Dataset]. http://doi.org/10.1021/acs.analchem.4c01600.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    ACS Publications
    Authors
    Yu He; Yang Li; Lili Zhao; Guojin Ying; Gang Lu; Lihua Zhang; Zhenbin Zhang
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Cross-linking mass spectrometry (XL-MS) is a powerful tool for elucidating protein structures and protein–protein interactions (PPIs) at the global scale. However, sensitive XL-MS analysis of mass-limited samples remains challenging, due to serious sample loss during sample preparation of the low-abundance cross-linked peptides. Herein, an optimized miniaturized filter-aided sample preparation (O-MICROFASP) method was presented for sensitive XL-MS analysis of microscale samples. By systematically investigating and optimizing crucial experimental factors, this approach dramatically improves the XL identification of low and submicrogram samples. Compared with the conventional FASP method, more than 7.4 times cross-linked peptides were identified from single-shot analysis of 1 μg DSS cross-linked HeLa cell lysates (440 vs 59). The number of cross-linked peptides identified from 0.5 μg HeLa cell lysates was increased by 58% when further reducing the surface area of the filter to 0.058 mm2 in the microreactor. To deepen the identification coverage of XL-proteome, five different types of cross-linkers were used and each μg of cross-linked HeLa cell lysates was processed by O-MICROFASP integrated with tip-based strong cation exchange (SCX) fractionation. Up to 2741 unique cross-linked peptides were identified from the 5 μg HeLa cell lysates, representing 2579 unique K–K linkages on 1092 proteins. About 96% of intraprotein cross-links were within the maximal distance restraints of 26 Å, and 75% of the identified PPIs reported by the STRING database were with high confidence (scores ≥0.9), confirming the high validity of the identified cross-links for protein structural mapping and PPI analysis. This study demonstrates that O-MICROFASP is a universal and efficient method for proteome-wide XL-MS analysis of microscale samples with high sensitivity and reliability.

  20. Gene expression sample information.

    • plos.figshare.com
    xlsx
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Ali Ghadie; Yu Xia (2023). Gene expression sample information. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010013.s006
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mohamed Ali Ghadie; Yu Xia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    (A) Sample counts and time points for all 63 datasets from GEO. (B) Names of the 16 tissues in the Illumina Body Map 2.0 dataset. (C) Names of the 183 tissue samples in the Fantom5 dataset. (XLSX)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
William L. Hamilton; Rex Ying; Jure Leskovec (2017). PPI Dataset [Dataset]. https://paperswithcode.com/dataset/ppi

PPI Dataset

Protein-Protein Interactions (PPI)

Explore at:
Dataset updated
Oct 29, 2017
Authors
William L. Hamilton; Rex Ying; Jure Leskovec
Description

protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to a different human tissue [41]. positional gene sets are used, motif gene sets and immunological signatures as features and gene ontology sets as labels (121 in total), collected from the Molecular Signatures Database [34]. The average graph contains 2373 nodes, with an average degree of 28.8.

Search
Clear search
Close search
Google apps
Main menu