74 datasets found
  1. Space Data Analytics Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Space Data Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/space-data-analytics-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Space Data Analytics Market Outlook



    The global space data analytics market size was valued at approximately $3.2 billion in 2023 and is projected to reach around $11.8 billion by 2032, reflecting a robust CAGR of 15.6% over the forecast period. Driven by the increasing deployment of satellites and growing advancements in machine learning and data analytics technologies, the market is poised for substantial growth. The convergence of these technologies allows for more efficient data collection, processing, and utilization, which fuels the demand for space data analytics across various sectors.



    The primary growth factor for the space data analytics market is the exponential increase in satellite deployments. Governments and private entities are launching satellites for diverse purposes such as communication, navigation, earth observation, and scientific research. This surge in satellite launches generates vast amounts of data that require sophisticated analytical tools to process and interpret. Consequently, the need for advanced analytics solutions to convert raw satellite data into actionable insights is driving the market forward. Additionally, advancements in artificial intelligence (AI) and machine learning (ML) are enhancing the capabilities of space data analytics, making them more accurate and efficient.



    Another significant growth driver is the escalating demand for real-time data and analytics in various industries. Sectors such as agriculture, defense, and environmental monitoring increasingly rely on satellite data for applications like precision farming, border surveillance, and climate change assessment. The ability to obtain real-time data from satellites and analyze it promptly allows organizations to make informed decisions swiftly, thereby improving operational efficiency and outcomes. Furthermore, the growing awareness about the advantages of space data analytics in proactive decision-making is expanding its adoption across multiple sectors.



    Moreover, international collaborations and government initiatives aimed at space exploration and satellite launches are propelling the market. Many countries are investing heavily in space missions and satellite projects, creating a fertile ground for the space data analytics market to thrive. These investments are accompanied by supportive regulatory frameworks and funding for research and development, further encouraging innovation and growth in the sector. Additionally, the commercialization of space activities and the emergence of private space enterprises are opening new avenues for market expansion.



    Artificial Intelligence in Space is revolutionizing the way we approach space exploration and data analysis. By integrating AI technologies with space missions, scientists and researchers can process vast amounts of data more efficiently and accurately. This integration allows for real-time decision-making and predictive analytics, which are crucial for successful space missions. AI's ability to learn and adapt makes it an invaluable tool for navigating the complex and unpredictable environment of space. As AI continues to evolve, its applications in space exploration are expected to expand, offering new possibilities for understanding our universe and enhancing the capabilities of space data analytics.



    From a regional perspective, North America holds the largest market share due to the presence of leading space agencies, like NASA, and prominent private space companies, such as SpaceX and Blue Origin. Europe follows closely, driven by robust investments in space research and development by the European Space Agency (ESA). The Asia Pacific region is expected to witness the fastest growth rate, attributed to increasing satellite launches by countries like China and India, alongside growing investments in space technology and analytics within the region.



    Component Analysis



    The space data analytics market can be segmented by component into software, hardware, and services. The software segment commands a significant share of the market due to the development of sophisticated analytics tools and platforms. These software solutions are crucial for processing and interpreting the vast amounts of data collected from satellites. Advanced algorithms and AI-powered analytics enable users to extract meaningful insights from raw data, driving the adoption of these solutions across various sectors. The continuous innovation in software capabilities, such as enhanced visualization t

  2. Replication Package for 'Data-Driven Analysis and Optimization of Machine...

    • zenodo.org
    zip
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel Castaño; Joel Castaño (2025). Replication Package for 'Data-Driven Analysis and Optimization of Machine Learning Systems Using MLPerf Benchmark Data' [Dataset]. http://doi.org/10.5281/zenodo.15643706
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joel Castaño; Joel Castaño
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data-Driven Analysis and Optimization of Machine Learning Systems Using MLPerf Benchmark Data

    This repository contains the full replication package for the Master's thesis 'Data-Driven Analysis and Optimization of Machine Learning Systems Using MLPerf Benchmark Data'. The project focuses on leveraging public MLPerf benchmark data to analyze ML system performance and develop a multi-objective optimization framework for recommending optimal hardware configurations.
    The framework considers the trade-offs between three key objectives:
    1. Performance (maximizing throughput)
    2. Energy Efficiency (minimizing estimated energy per unit)
    3. Cost (minimizing estimated hardware cost)

    Repository Structure

    This repository is organized as follows:
    • Data_Analysis.ipynb: A Jupyter Notebook containing the code for the Exploratory Data Analysis (EDA) presented in the thesis. Running this notebook reproduces the plots in the eda_plots/ directory.
    • Dataset_Extension.ipynb : A Jupyter Notebook used for the data enrichment process. It takes the raw `Inference_data.csv` and produces the Inference_data_Extended.csv by adding detailed hardware specifications, cost estimates, and derived energy metrics.
    • Optimization_Model.ipynb: The main Jupyter Notebook for the core contribution of this thesis. It contains the code to perform the 5-fold cross-validation, train the final predictive models, generate the Pareto-optimal recommendations, and create the final result figures.
    • Inference_data.csv: The raw, unprocessed data collected from the official MLPerf Inference v4.0 results.
    • Inference_data_Extended.csv: The final, enriched dataset used for all analysis and modeling. This is the output of the Dataset_Extension.ipynb notebook.
    • eda_log.txt: A text log file containing summary statistics generated during the exploratory data analysis.
    • requirements.txt: A list of all necessary Python libraries and their versions required to run the code in this repository.
    • eda_plots/: A directory containing all plots (correlation matrices, scatter plots, box plots) generated by the EDA notebook.
    • optimization_models_final/: A directory where the trained and saved final model files (.joblib) are stored after running the optimization notebook.
    • pareto_validation_plot_fold_0.png: The validation plot comparing the true vs. predicted Pareto fronts, as presented in the thesis.
    • shap_waterfall_final_model.png: The SHAP plot used for the model interpretability analysis, as presented in the thesis.

    Requirements and Installation

    To reproduce the results, it is recommended to use a Python virtual environment to avoid conflicts with other projects.
    1. Clone the repository:
    bash
    git clone
    cd
    2. **Create and activate a virtual environment (optional but recommended):
    bash
    python -m venv venv
    source venv/bin/activate # On Windows, use `venv\Scripts\activate`
    3. Install the required packages:
    All dependencies are listed in the `requirements.txt` file. Install them using pip:
    bash
    pip install -r requirements.txt

    Step-by-Step Reproduction Workflow

    The notebooks are designed to be run in a logical sequence.

    Step 1: Data Enrichment (Optional)

    The final enriched dataset (`Inference_data_Extended.csv`) is already provided. However, if you wish to reproduce the enrichment process from scratch, you can run the **`Dataset_Extension.ipynb`** notebook. It will take `Inference_data.csv` as input and generate the extended version.

    Step 2: Exploratory Data Analysis (Optional)

    All plots from the EDA are pre-generated and available in the `eda_plots/` directory. To regenerate them, run the **`Data_Analysis.ipynb`** notebook. This will overwrite the existing plots and the `eda_log.txt` file.

    Step 3: Main Model Training, Validation, and Recommendation

    This is the core of the thesis. Running the Optimization_Model.ipynb notebook will execute the entire pipeline described in the paper:
    1. It will perform the 5-fold group-aware cross-validation to validate the performance of the predictive models.
    2. It will train the final production models on the entire dataset and save them to the optimization_models_final/ directory.
    3. It will generate the final Pareto front recommendations and single-best recommendations for the Computer Vision task.
    4. It will generate the final figures used in the results section, including pareto_validation_plot_fold_0.png and shap_waterfall_final_model.png.
  3. B

    Easing into Excellent Excel Practices Learning Series / Série...

    • borealisdata.ca
    • search.dataone.org
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie Marcoux (2023). Easing into Excellent Excel Practices Learning Series / Série d'apprentissages en route vers des excellentes pratiques Excel [Dataset]. http://doi.org/10.5683/SP3/WZYO1F
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    Borealis
    Authors
    Julie Marcoux
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    With a step-by-step approach, learn to prepare Excel files, data worksheets, and individual data columns for data analysis; practice conditional formatting and creating pivot tables/charts; go over basic principles of Research Data Management as they might apply to an Excel project. Avec une approche étape par étape, apprenez à préparer pour l’analyse des données des fichiers Excel, des feuilles de calcul de données et des colonnes de données individuelles; pratiquez la mise en forme conditionnelle et la création de tableaux croisés dynamiques ou de graphiques; passez en revue les principes de base de la gestion des données de recherche tels qu’ils pourraient s’appliquer à un projet Excel.

  4. f

    S1 Data -

    • plos.figshare.com
    xls
    Updated Jan 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charanjit Kaur; Pei P. Tan; Nurjannah Nurjannah; Ririn Yuniasih (2025). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0312306.s001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Charanjit Kaur; Pei P. Tan; Nurjannah Nurjannah; Ririn Yuniasih
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data is becoming increasingly ubiquitous today, and data literacy has emerged an essential skill in the workplace. Therefore, it is necessary to equip high school students with data literacy skills in order to prepare them for further learning and future employment. In Indonesia, there is a growing shift towards integrating data literacy in the high school curriculum. As part of a pilot intervention project, academics from two leading Universities organised data literacy boot camps for high school students across various cities in Indonesia. The boot camps aimed at increasing participants’ awareness of the power of analytical and exploration skills, which in turn, would contribute to creating independent and data-literate students. This paper explores student participants’ self-perception of their data literacy as a result of the skills acquired from the boot camps. Qualitative and quantitative data were collected through student surveys and a focus group discussion, and were used to analyse student perception post-intervention. The findings indicate that students became more aware of the usefulness of data literacy and its application in future studies and work after participating in the boot camp. Of the materials delivered at the boot camps, students found the greatest benefit in learning basic statistical concepts and applying them through the use of Microsoft Excel as a tool for basic data analysis. These findings provide valuable policy recommendations that educators and policymakers can use as guidelines for effective data literacy teaching in high schools.

  5. A

    ‘California Housing Data (1990)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘California Housing Data (1990)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-data-1990-a0c5/b7389540/?iid=007-628&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    Analysis of ‘California Housing Data (1990)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harrywang/housing on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Source

    This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!

    The data is based on California Census in 1990.

    About the Data (from the book):

    "This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.

    The following is the description from the book author:

    This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

    The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."

    About the Data (From Luís Torgo page):

    http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html

    This is a dataset obtained from the StatLib repository. Here is the included description:

    "We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."

    End-to-End ML Project Steps (Chapter 2 of the book)

    1. Look at the big picture
    2. Get the data
    3. Discover and visualize the data to gain insights
    4. Prepare the data for Machine Learning algorithms
    5. Select a model and train it
    6. Fine-tune your model
    7. Present your solution
    8. Launch, monitor, and maintain your system

    The 10-Step Machine Learning Project Workflow (My Version)

    1. Define business object
    2. Make sense of the data from a high level
      • data types (number, text, object, etc.)
      • continuous/discrete
      • basic stats (min, max, std, median, etc.) using boxplot
      • frequency via histogram
      • scales and distributions of different features
    3. Create the traning and test sets using proper sampling methods, e.g., random vs. stratified
    4. Correlation analysis (pair-wise and attribute combinations)
    5. Data cleaning (missing data, outliers, data errors)
    6. Data transformation via pipelines (categorical text to number using one hot encoding, feature scaling via normalization/standardization, feature combinations)
    7. Train and cross validate different models and select the most promising one (Linear Regression, Decision Tree, and Random Forest were tried in this tutorial)
    8. Fine tune the model using trying different combinations of hyperparameters
    9. Evaluate the model with best estimators in the test set
    10. Launch, monitor, and refresh the model and system

    --- Original source retains full ownership of the source dataset ---

  6. Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data

    • plos.figshare.com
    docx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothy Bailey; Pawel Krajewski; Istvan Ladunga; Celine Lefebvre; Qunhua Li; Tao Liu; Pedro Madrigal; Cenny Taslim; Jie Zhang (2023). Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data [Dataset]. http://doi.org/10.1371/journal.pcbi.1003326
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Timothy Bailey; Pawel Krajewski; Istvan Ladunga; Celine Lefebvre; Qunhua Li; Tao Liu; Pedro Madrigal; Cenny Taslim; Jie Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.

  7. m

    Data for "Direct and indirect Rod and Frame effect: A virtual reality study"...

    • data.mendeley.com
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michał Adamski (2025). Data for "Direct and indirect Rod and Frame effect: A virtual reality study" [Dataset]. http://doi.org/10.17632/pcf2n8b4rd.1
    Explore at:
    Dataset updated
    Feb 12, 2025
    Authors
    Michał Adamski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the raw experimental data and supplementary materials for the "Asymmetry Effects in Virtual Reality Rod and Frame Test". The materials included are:

    •  Raw Experimental Data: older.csv and young.csv
    •  Mathematica Notebooks: a collection of Mathematica notebooks used for data analysis and visualization. These notebooks provide scripts for processing the experimental data, performing statistical analyses, and generating the figures used in the project.
    •  Unity Package: a Unity package featuring a sample scene related to the project. The scene was built using Unity’s Universal Rendering Pipeline (URP). To utilize this package, ensure that URP is enabled in your Unity project. Instructions for enabling URP can be found in the Unity URP Documentation.
    

    Requirements:

    •  For Data Files: software capable of opening CSV files (e.g., Microsoft Excel, Google Sheets, or any programming language that can read CSV formats).
    •  For Mathematica Notebooks: Wolfram Mathematica software to run and modify the notebooks.
    •  For Unity Package: Unity Editor version compatible with URP (2019.3 or later recommended). URP must be installed and enabled in your Unity project.
    

    Usage Notes:

    •  The dataset facilitates comparative studies between different age groups based on the collected variables.
    •  Users can modify the Mathematica notebooks to perform additional analyses.
    •  The Unity scene serves as a reference to the project setup and can be expanded or integrated into larger projects.
    

    Citation: Please cite this dataset when using it in your research or publications.

  8. u

    [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User...

    • recerca.uoc.edu
    • data.niaid.nih.gov
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krukowski, Simon; Amarasinghe, Ishari; Gutiérrez-Páez, Nicolás Felipe; Hoppe, H. Ulrich; Krukowski, Simon; Amarasinghe, Ishari; Gutiérrez-Páez, Nicolás Felipe; Hoppe, H. Ulrich (2022). [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects [Dataset]. https://recerca.uoc.edu/documentos/67e11e4a41a307553fc2f26b
    Explore at:
    Dataset updated
    2022
    Authors
    Krukowski, Simon; Amarasinghe, Ishari; Gutiérrez-Páez, Nicolás Felipe; Hoppe, H. Ulrich; Krukowski, Simon; Amarasinghe, Ishari; Gutiérrez-Páez, Nicolás Felipe; Hoppe, H. Ulrich
    Description

    Explanation/Overview: Corresponding dataset for the analyses and results achieved in the CS Track project in the research line on participation analyses, which is also reported in the publication "Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects", a conference paper for the conference CollabTech 2022: Collaboration Technologies and Social Computing and published as part of the Lecture Notes in Computer Science book series (LNCS,volume 13632) here. The usernames have been anonymised. Purpose: The purpose of this dataset is to provide the basis to reproduce the results reported in the associated deliverable, and in the above-mentioned publication. As such, it does not represent raw data, but rather files that already include certain analysis steps (like calculated degrees or other SNA-related measures), ready for analysis, visualisation and interpretation with R. Relatedness: The data of the different projects was derived from the forums of 7 Zooniverse projects based on similar discussion board features. The projects are: 'Galaxy Zoo', 'Gravity Spy', 'Seabirdwatch', 'Snapshot Wisconsin', 'Wildwatch Kenya', 'Galaxy Nurseries', 'Penguin Watch'. Content: In this Zenodo entry, several files can be found. The structure is as follows (files and folders and descriptions). corresponding_calculations.html Quarto-notebook to view in browser corresponding_calculations.qmd Quarto-notebook to view in RStudio assets data annotations annotations.csv List of annotations made per day for each of the analysed projects comments comments.csv Total list of comments with several data fields (i.e., comment id, text, reply_user_id) rolechanges 478_rolechanges.csv List of roles per user to determine number of role changes 1104_rolechanges.csv ... ... totalnetworkdata Edges 478_edges.csv Network data (edge set) for the given projects (without time slices) 1104_edges.csv ... ... Nodes 478_nodes.csv Network data (node set) for the given projects (without time slices) 1104_nodes.csv ... ... trajectories Network data (edge and node sets) for the given projects and all time slices (Q1 2016 - Q4 2021) 478 Edges edges_4782016_q1.csv edges_4782016_q2.csv edges_4782016_q3.csv edges_4782016_q4.csv ... Nodes nodes_4782016_q1.csv nodes_4782016_q4.csv nodes_4782016_q3.csv nodes_4782016_q2.csv ... 1104 Edges ... Nodes ... ... scripts datavizfuncs.R script for the data visualisation functions, automatically executed from within corresponding_calculations.qmd import.R script for the import of data, automatically executed from within corresponding_calculations.qmd corresponding_calculations_files files for the html/qmd view in the browser/RStudio Grouping: The data is grouped according to given criteria (e.g., project_title or time). Accordingly, the respective files can be found in the data structure

  9. f

    Data_Sheet_2_STATegra: Multi-Omics Data Integration – A Conceptual Scheme...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nuria Planell; Vincenzo Lagani; Patricia Sebastian-Leon; Frans van der Kloet; Ewoud Ewing; Nestoras Karathanasis; Arantxa Urdangarin; Imanol Arozarena; Maja Jagodic; Ioannis Tsamardinos; Sonia Tarazona; Ana Conesa; Jesper Tegner; David Gomez-Cabrero (2023). Data_Sheet_2_STATegra: Multi-Omics Data Integration – A Conceptual Scheme With a Bioinformatics Pipeline.pdf [Dataset]. http://doi.org/10.3389/fgene.2021.620453.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Nuria Planell; Vincenzo Lagani; Patricia Sebastian-Leon; Frans van der Kloet; Ewoud Ewing; Nestoras Karathanasis; Arantxa Urdangarin; Imanol Arozarena; Maja Jagodic; Ioannis Tsamardinos; Sonia Tarazona; Ana Conesa; Jesper Tegner; David Gomez-Cabrero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.1

  10. Z

    Counts of Pellagra reported in UNITED STATES OF AMERICA: 1923-1932

    • data.niaid.nih.gov
    Updated Jun 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burke, Donald (2024). Counts of Pellagra reported in UNITED STATES OF AMERICA: 1923-1932 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452462
    Explore at:
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Van Panhuis, Willem
    Cross, Anne
    Burke, Donald
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  11. n

    ECMWF ERA-40: reduced N80 Gaussian gridded pressure level analysis data...

    • data-search.nerc.ac.uk
    Updated Jul 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). ECMWF ERA-40: reduced N80 Gaussian gridded pressure level analysis data (ggap) [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=pressure%20level
    Explore at:
    Dataset updated
    Jul 13, 2021
    Description

    This dataset contains N80 Gaussian gridded, pressure level, analysis time step parameters from the European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA) 40 program from September 1957 to August 2002. ERA-40 followed on from the ERA-15 re-analysis project. Access limited to UK based academic researchers only. These data are GRIB formatted.

  12. p

    Counts of Meningococcal infectious disease reported in UNITED STATES OF...

    • tycho.pitt.edu
    Updated Apr 1, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Meningococcal infectious disease reported in UNITED STATES OF AMERICA: 1951-2010 [Dataset]. https://www.tycho.pitt.edu/dataset/US.23511006
    Explore at:
    Dataset updated
    Apr 1, 2018
    Dataset provided by
    Project Tycho, University of Pittsburgh
    Authors
    Willem G Van Panhuis; Anne L Cross; Donald S Burke
    Time period covered
    1951 - 2010
    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  13. n

    ECMWF ERA-40: T159 spherical harmonic gridded pressure level analysis data...

    • data-search.nerc.ac.uk
    • catalogue.ceda.ac.uk
    Updated Jul 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). ECMWF ERA-40: T159 spherical harmonic gridded pressure level analysis data (spap) [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=ERA-40
    Explore at:
    Dataset updated
    Jul 19, 2021
    Description

    This dataset contains T159 spherical harmonics gridded, pressure level, analysis time step data from the European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA) 40 program from September 1957 to August 2002. ERA-40 followed on from the ERA-15 re-analysis project. Access limited to UK based academic researchers only. These data are GRIB formatted.

  14. Storage and Transit Time Data and Code

    • zenodo.org
    zip
    Updated Oct 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Felton; Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. Felton
    Date: 10/29/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

    Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

    #Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a role:

    "01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

    "02_functions.R": This script contains custom functions. Load this using the
    `source()` function in the 01_start.R script.

    "03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
    `source()` function in the 01_start.R script.

    "04_figures_tables.R": This is the main workhouse for figure/table production and
    supporting analyses. This script generates the key figures and summary statistics
    used in the study that then get saved in the manuscript_figures folder. Note that all
    maps were produced using Python code found in the "supporting_code"" folder.

    "supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

    "supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

  15. p

    Counts of Smallpox reported in UNITED STATES OF AMERICA: 1888-1952

    • tycho.pitt.edu
    Updated Apr 1, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Smallpox reported in UNITED STATES OF AMERICA: 1888-1952 [Dataset]. https://www.tycho.pitt.edu/dataset/US.67924001
    Explore at:
    Dataset updated
    Apr 1, 2018
    Dataset provided by
    Project Tycho, University of Pittsburgh
    Authors
    Willem G Van Panhuis; Anne L Cross; Donald S Burke
    Time period covered
    1888 - 1952
    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  16. a

    Ramp Analysis

    • hub.arcgis.com
    • pa-geo-data-pennmap.hub.arcgis.com
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PennShare (2025). Ramp Analysis [Dataset]. https://hub.arcgis.com/datasets/0f5b33ce24fd45ee975cd711f9c64a24
    Explore at:
    Dataset updated
    Feb 13, 2025
    Dataset authored and provided by
    PennShare
    Area covered
    Description

    Network screening analysis data of ramps in Pennsylvania completed in 2024. Data can be filtered by county, planning partner and engineering district.Notes from the PennDOT HSNS Video https://www.youtube.com/watch?v=liXTnqxZjCgNetwork screening analysis can be used for safety analysis and decision making to decrease frequency and severity of crashes in Pennsylvania.Network screening is a method from the Highway Safety Manual (HSM) that compares expected crash frequencies and crash severities to historical crash data based on Part C of HSM. It helps evaluate facilities and identify and prioritize locations that are likely to respond to safety improvement investments. FHWA states that employing traditional networking screening with systemic safety analysis can be an agency’s first step toward a comprehensive safety management program. The network screening is the first step in the Roadway Safety Management Process (Part B of the HSM) and it considers crash history, roadway factors and traffic characteristics.Roadway Safety Management Process (Part B of the HSM) Steps Network Screening Diagnosis Select Countermeasures Economic Appraisal Prioritize Projects Safety effectiveness evaluationRoadway safety management process parallels the method by which PennDOT selects and evaluates projects for Federal Highway Safety Improvement Program.SPF: safety performance functionPositive/high excess cost locations are good candidates for safety improvements.Urban: sites within urban boundaries (Census) where population is more than 5,000 people.Rural: sites outside of urban boundaries (Census) where population is less than 5,000 people.Crashes within 250 feet of an intersection are assigned to the intersection for analysis.

  17. Z

    Counts of Dengue hemorrhagic fever reported in VANUATU: 2009-2009

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cross, Anne (2024). Counts of Dengue hemorrhagic fever reported in VANUATU: 2009-2009 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452689
    Explore at:
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Van Panhuis, Willem
    Cross, Anne
    Burke, Donald
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  18. Z

    Counts of Bacillary dysentery reported in UNITED STATES OF AMERICA:...

    • data.niaid.nih.gov
    Updated Jun 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cross, Anne (2024). Counts of Bacillary dysentery reported in UNITED STATES OF AMERICA: 1942-1948 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452355
    Explore at:
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Van Panhuis, Willem
    Cross, Anne
    Burke, Donald
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  19. f

    Themes and sub-themes generated from thematic analysis of the data from...

    • plos.figshare.com
    xls
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charanjit Kaur; Pei P. Tan; Nurjannah Nurjannah; Ririn Yuniasih (2025). Themes and sub-themes generated from thematic analysis of the data from pre-boot camp surveys. [Dataset]. http://doi.org/10.1371/journal.pone.0312306.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Charanjit Kaur; Pei P. Tan; Nurjannah Nurjannah; Ririn Yuniasih
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Themes and sub-themes generated from thematic analysis of the data from pre-boot camp surveys.

  20. Z

    Counts of Acute nonparalytic poliomyelitis reported in UNITED STATES OF...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burke, Donald (2024). Counts of Acute nonparalytic poliomyelitis reported in UNITED STATES OF AMERICA: 1954-1963 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452263
    Explore at:
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Van Panhuis, Willem
    Cross, Anne
    Burke, Donald
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dataintelo (2025). Space Data Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/space-data-analytics-market
Organization logo

Space Data Analytics Market Report | Global Forecast From 2025 To 2033

Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License

https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

Time period covered
2024 - 2032
Area covered
Global
Description

Space Data Analytics Market Outlook



The global space data analytics market size was valued at approximately $3.2 billion in 2023 and is projected to reach around $11.8 billion by 2032, reflecting a robust CAGR of 15.6% over the forecast period. Driven by the increasing deployment of satellites and growing advancements in machine learning and data analytics technologies, the market is poised for substantial growth. The convergence of these technologies allows for more efficient data collection, processing, and utilization, which fuels the demand for space data analytics across various sectors.



The primary growth factor for the space data analytics market is the exponential increase in satellite deployments. Governments and private entities are launching satellites for diverse purposes such as communication, navigation, earth observation, and scientific research. This surge in satellite launches generates vast amounts of data that require sophisticated analytical tools to process and interpret. Consequently, the need for advanced analytics solutions to convert raw satellite data into actionable insights is driving the market forward. Additionally, advancements in artificial intelligence (AI) and machine learning (ML) are enhancing the capabilities of space data analytics, making them more accurate and efficient.



Another significant growth driver is the escalating demand for real-time data and analytics in various industries. Sectors such as agriculture, defense, and environmental monitoring increasingly rely on satellite data for applications like precision farming, border surveillance, and climate change assessment. The ability to obtain real-time data from satellites and analyze it promptly allows organizations to make informed decisions swiftly, thereby improving operational efficiency and outcomes. Furthermore, the growing awareness about the advantages of space data analytics in proactive decision-making is expanding its adoption across multiple sectors.



Moreover, international collaborations and government initiatives aimed at space exploration and satellite launches are propelling the market. Many countries are investing heavily in space missions and satellite projects, creating a fertile ground for the space data analytics market to thrive. These investments are accompanied by supportive regulatory frameworks and funding for research and development, further encouraging innovation and growth in the sector. Additionally, the commercialization of space activities and the emergence of private space enterprises are opening new avenues for market expansion.



Artificial Intelligence in Space is revolutionizing the way we approach space exploration and data analysis. By integrating AI technologies with space missions, scientists and researchers can process vast amounts of data more efficiently and accurately. This integration allows for real-time decision-making and predictive analytics, which are crucial for successful space missions. AI's ability to learn and adapt makes it an invaluable tool for navigating the complex and unpredictable environment of space. As AI continues to evolve, its applications in space exploration are expected to expand, offering new possibilities for understanding our universe and enhancing the capabilities of space data analytics.



From a regional perspective, North America holds the largest market share due to the presence of leading space agencies, like NASA, and prominent private space companies, such as SpaceX and Blue Origin. Europe follows closely, driven by robust investments in space research and development by the European Space Agency (ESA). The Asia Pacific region is expected to witness the fastest growth rate, attributed to increasing satellite launches by countries like China and India, alongside growing investments in space technology and analytics within the region.



Component Analysis



The space data analytics market can be segmented by component into software, hardware, and services. The software segment commands a significant share of the market due to the development of sophisticated analytics tools and platforms. These software solutions are crucial for processing and interpreting the vast amounts of data collected from satellites. Advanced algorithms and AI-powered analytics enable users to extract meaningful insights from raw data, driving the adoption of these solutions across various sectors. The continuous innovation in software capabilities, such as enhanced visualization t

Search
Clear search
Close search
Google apps
Main menu