100+ datasets found
  1. Raw data from datasets used in SIMON analysis

    • zenodo.org
    bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adriana Tomic; Adriana Tomic; Ivan Tomic; Ivan Tomic (2020). Raw data from datasets used in SIMON analysis [Dataset]. http://doi.org/10.5281/zenodo.2580414
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Adriana Tomic; Adriana Tomic; Ivan Tomic; Ivan Tomic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here you can find raw data and information about each of the 34 datasets generated by the mulset algorithm and used for further analysis in SIMON.
    Each dataset is stored in separate folder which contains 4 files:

    json_info: This file contains, number of features with their names and number of subjects that are available for the same dataset
    data_testing: data frame with data used to test trained model
    data_training: data frame with data used to train models
    results: direct unfiltered data from database

    Files are written in feather format. Here is an example of data structure for each file in repository.

    File was compressed using 7-Zip available at https://www.7-zip.org/.

  2. Raw data and Analysis

    • figshare.com
    xlsx
    Updated Mar 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aungkana Boonsem; Anan Malarat; Aditep Na Phatthalung (2023). Raw data and Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.22122374.v4
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 5, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Aungkana Boonsem; Anan Malarat; Aditep Na Phatthalung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The raw data on behavior and physical fitness. The behavior for sampling worker before joining WE is on sheet behavior 31 and 62 Then, we show all data for behavior and physical fitness.

  3. C

    Raw Data for ConfLab: A Data Collection Concept, Dataset, and Benchmark for...

    • data.4tu.nl
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung (2022). Raw Data for ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild [Dataset]. http://doi.org/10.4121/20017748.v2
    Explore at:
    Dataset updated
    Jun 7, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung
    License

    https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf

    Description

    This file contains raw data for cameras and wearables of the ConfLab dataset.


    ./cameras

    contains the overhead video recordings for 9 cameras (cam2-10) in MP4 files.

    These cameras cover the whole interaction floor, with camera 2 capturing the

    bottom of the scene layout, and camera 10 capturing top of the scene layout.

    Note that cam5 ran out of battery before the other cameras and thus the recordings

    are cut short. However, cam4 and 6 contain significant overlap with cam 5, to

    reconstruct any information needed.


    Note that the annotations are made and provided in 2 minute segments.

    The annotated portions of the video include the last 3min38sec of x2xxx.MP4

    video files, and the first 12 min of x3xxx.MP4 files for cameras (2,4,6,8,10),

    with "x" being the placeholder character in the mp4 file names. If one wishes

    to separate the video into 2 min segments as we did, the "video-splitting.sh"

    script is provided.


    ./camera-calibration contains the camera instrinsic files obtained from

    https://github.com/idiap/multicamera-calibration. Camera extrinsic parameters can

    be calculated using the existing intrinsic parameters and the instructions in the

    multicamera-calibration repo. The coordinates in the image are provided by the

    crosses marked on the floor, which are visible in the video recordings.

    The crosses are 1m apart (=100cm).


    ./wearables

    subdirectory includes the IMU, proximity and audio data from each

    participant at the Conflab event (48 in total). In the directory numbered

    by participant ID, the following data are included:

    1. raw audio file

    2. proximity (bluetooth) pings (RSSI) file (raw and csv) and a visualization

    3. Tri-axial accelerometer data (raw and csv) and a visualization

    4. Tri-axial gyroscope data (raw and csv) and a visualization

    5. Tri-axial magnetometer data (raw and csv) and a visualization

    6. Game rotation vector (raw and csv), recorded in quaternions.


    All files are timestamped.

    The sampling frequencies are:

    - audio: 1250 Hz

    - rest: around 50Hz. However, the sample rate is not fixed

    and instead the timestamps should be used.


    For rotation, the game rotation vector's output frequency is limited by the

    actual sampling frequency of the magnetometer. For more information, please refer to

    https://invensense.tdk.com/wp-content/uploads/2016/06/DS-000189-ICM-20948-v1.3.pdf


    Audio files in this folder are in raw binary form. The following can be used to convert

    them to WAV files (1250Hz):


    ffmpeg -f s16le -ar 1250 -ac 1 -i /path/to/audio/file


    Synchronization of cameras and werables data

    Raw videos contain timecode information which matches the timestamps of the data in

    the "wearables" folder. The starting timecode of a video can be read as:

    ffprobe -hide_banner -show_streams -i /path/to/video


    ./audio

    ./sync: contains wav files per each subject

    ./sync_files: auxiliary csv files used to sync the audio. Can be used to improve the synchronization.

    The code used for syncing the audio can be found here:

    https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/audio

  4. f

    Table_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Loffing (2023). Table_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A Syntax Collection and Tutorial.XLSX [Dataset]. http://doi.org/10.3389/fpsyg.2022.808469.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Frontiers
    Authors
    Florian Loffing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.

  5. r

    Raw data outputs 1-18

    • researchdata.edu.au
    Updated Nov 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monash University (2022). Raw data outputs 1-18 [Dataset]. https://researchdata.edu.au/raw-outputs-1-18/2089494
    Explore at:
    Dataset updated
    Nov 18, 2022
    Dataset provided by
    Monash University
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data outputs 1-18

    Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis.

    Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis.

    Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis.

    Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study.

    Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones.

    Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively.

    Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively.

    Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis.

    Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis.

    Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs.

    Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section.

    Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section.

    Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis.

    Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining.

    Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section.

    Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section.

    Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell.

    Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.

  6. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  7. 4

    Raw data, analysis and modelling scripts for the article "Bichromatic Rabi...

    • data.4tu.nl
    zip
    Updated Aug 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin John; Francesco Borsoi; Zoltán György; Chien-An Wang; Gábor Széchenyi; Floor van Riggelen; William Lawrie; Nico Hendrickx; Amir Sammak; Giordano Scappucci; András Pályi; M. (Menno) Veldhorst (2023). Raw data, analysis and modelling scripts for the article "Bichromatic Rabi control of semiconductor qubits" [Dataset]. http://doi.org/10.4121/bb43fe1d-f503-49e8-9f17-ce7d734f015d.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 11, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Valentin John; Francesco Borsoi; Zoltán György; Chien-An Wang; Gábor Széchenyi; Floor van Riggelen; William Lawrie; Nico Hendrickx; Amir Sammak; Giordano Scappucci; András Pályi; M. (Menno) Veldhorst
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2023
    Dataset funded by
    Ministry of Culture and Innovation and the National Research, Development and Innovation Office (NKFIH)
    Dutch Research Council
    Dutch Research Council (NWO)
    Hungarian Academy of Sciences
    Ministry for Culture and Innovation
    European Union
    NKFIH
    Description

    The research primarily investigates the challenges associated with electrically-driven spin resonance in controlling semiconductor spin qubits, particularly when scaling up to larger systems. The study introduces and evaluates a coherent bichromatic Rabi control method for quantum dot hole spin qubits, aiming to provide a spatially-selective approach for extensive qubit arrays. The findings are supported through a theoretical framework, emphasizing the significance of interdot motion in bichromatic driving. This research is experimental and theoretical in nature. The data was collected with a digitiser through RF-reflectometry by measuring the charge response of a single-hole transistor.

  8. Raw data in SPSS Software

    • zenodo.org
    Updated Jul 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esubalew Tesfahun; Esubalew Tesfahun (2023). Raw data in SPSS Software [Dataset]. http://doi.org/10.5281/zenodo.8151987
    Explore at:
    Dataset updated
    Jul 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Esubalew Tesfahun; Esubalew Tesfahun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data used for analysis

  9. B

    Data Cleaning Sample

    • borealisdata.ca
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  10. h

    dataset-tsql-data-analysis

    • huggingface.co
    Updated Jan 1, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Meldrum (2020). dataset-tsql-data-analysis [Dataset]. https://huggingface.co/datasets/dmeldrum6/dataset-tsql-data-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 1, 2020
    Authors
    David Meldrum
    Description

    Dataset Card for dataset-tsql-data-analysis

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dmeldrum6/dataset-tsql-data-analysis/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dmeldrum6/dataset-tsql-data-analysis.

  11. Coffee Bean Sales Raw Dataset

    • kaggle.com
    Updated Oct 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saad Haroon (2023). Coffee Bean Sales Raw Dataset [Dataset]. https://www.kaggle.com/datasets/saadharoon27/coffee-bean-sales-raw-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saad Haroon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Coffee Bean Sales Dataset - Comprehensive Insights into Coffee Orders, Customers, and Products

    Description: Elevate your data-driven coffee journey with the Coffee Bean Sales Dataset, a treasure trove of information that delves into the world of coffee orders, customer profiles, and an extensive range of coffee products. Whether you're a coffee enthusiast, a data analyst, or a business owner, this dataset provides valuable insights into the coffee industry.

    Contents:

    Orders Worksheet:

    Order ID: A unique identifier for each coffee order. Order Date: The date when the order was placed. Customer ID: An identifier linking the order to a specific customer. Product ID: A unique identifier for each coffee product. Quantity: The quantity of the coffee product ordered.

    Customers Worksheet:

    Customer ID: A unique identifier for each customer. Customer Name: The name of the customer. Email Address: Contact information for customers. Phone Number: Another contact detail for customers. And more: Explore a wide range of customer attributes for segmentation and analysis.

    Products Worksheet:

    Product ID: A unique identifier for each coffee product. Coffee Type: The type or blend of coffee, such as Arabica or Robusta. Roast Type: The roast level, including light, medium, or dark roast. Size: Information about the product size. Unit Price: The price of a single unit of the coffee product. Price Per 100g: The price per 100 grams for detailed price comparisons. Profit: Insights into the profitability of each coffee product.

    Use Cases:

    Market Analysis: Uncover trends in coffee consumption by analysing order patterns over time. Customer Segmentation: Create customer segments based on demographics and preferences. Product Strategy: Identify the most profitable coffee products and optimize pricing. Inventory Management: Ensure that the right quantities of each coffee product are stocked. Marketing Campaigns: Tailor marketing campaigns to specific customer segments.

    Get your hands on the Coffee Bean Sales Dataset and start brewing insights today!

  12. Solar Data Analysis Center

    • catalog.data.gov
    • data.nasa.gov
    • +2more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). Solar Data Analysis Center [Dataset]. https://catalog.data.gov/dataset/solar-data-analysis-center
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The Yohkoh Legacy data Archive (YLA) is intended to provide all usable scientific data obtained with the Yohkoh satellite, in convenient forms for research and education. The YLA consists of the whole set of Yohkoh data (from raw data to highly processed catalogs), with the web services of quick look, data search, and a sufficient amount of explanatory materials.

  13. D

    Space Data Analytics Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Space Data Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/space-data-analytics-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Authors
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Space Data Analytics Market Outlook



    The global space data analytics market size was valued at approximately $3.2 billion in 2023 and is projected to reach around $11.8 billion by 2032, reflecting a robust CAGR of 15.6% over the forecast period. Driven by the increasing deployment of satellites and growing advancements in machine learning and data analytics technologies, the market is poised for substantial growth. The convergence of these technologies allows for more efficient data collection, processing, and utilization, which fuels the demand for space data analytics across various sectors.



    The primary growth factor for the space data analytics market is the exponential increase in satellite deployments. Governments and private entities are launching satellites for diverse purposes such as communication, navigation, earth observation, and scientific research. This surge in satellite launches generates vast amounts of data that require sophisticated analytical tools to process and interpret. Consequently, the need for advanced analytics solutions to convert raw satellite data into actionable insights is driving the market forward. Additionally, advancements in artificial intelligence (AI) and machine learning (ML) are enhancing the capabilities of space data analytics, making them more accurate and efficient.



    Another significant growth driver is the escalating demand for real-time data and analytics in various industries. Sectors such as agriculture, defense, and environmental monitoring increasingly rely on satellite data for applications like precision farming, border surveillance, and climate change assessment. The ability to obtain real-time data from satellites and analyze it promptly allows organizations to make informed decisions swiftly, thereby improving operational efficiency and outcomes. Furthermore, the growing awareness about the advantages of space data analytics in proactive decision-making is expanding its adoption across multiple sectors.



    Moreover, international collaborations and government initiatives aimed at space exploration and satellite launches are propelling the market. Many countries are investing heavily in space missions and satellite projects, creating a fertile ground for the space data analytics market to thrive. These investments are accompanied by supportive regulatory frameworks and funding for research and development, further encouraging innovation and growth in the sector. Additionally, the commercialization of space activities and the emergence of private space enterprises are opening new avenues for market expansion.



    Artificial Intelligence in Space is revolutionizing the way we approach space exploration and data analysis. By integrating AI technologies with space missions, scientists and researchers can process vast amounts of data more efficiently and accurately. This integration allows for real-time decision-making and predictive analytics, which are crucial for successful space missions. AI's ability to learn and adapt makes it an invaluable tool for navigating the complex and unpredictable environment of space. As AI continues to evolve, its applications in space exploration are expected to expand, offering new possibilities for understanding our universe and enhancing the capabilities of space data analytics.



    From a regional perspective, North America holds the largest market share due to the presence of leading space agencies, like NASA, and prominent private space companies, such as SpaceX and Blue Origin. Europe follows closely, driven by robust investments in space research and development by the European Space Agency (ESA). The Asia Pacific region is expected to witness the fastest growth rate, attributed to increasing satellite launches by countries like China and India, alongside growing investments in space technology and analytics within the region.



    Component Analysis



    The space data analytics market can be segmented by component into software, hardware, and services. The software segment commands a significant share of the market due to the development of sophisticated analytics tools and platforms. These software solutions are crucial for processing and interpreting the vast amounts of data collected from satellites. Advanced algorithms and AI-powered analytics enable users to extract meaningful insights from raw data, driving the adoption of these solutions across various sectors. The continuous innovation in software capabilities, such as enhanced visualization t

  14. Raw data of survival analysis

    • figshare.com
    xlsx
    Updated Aug 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Gao (2020). Raw data of survival analysis [Dataset]. http://doi.org/10.6084/m9.figshare.12751439.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 20, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Li Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data of survival analysis

  15. n

    Data from: Designing data science workshops for data-intensive environmental...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Dec 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison Theobold; Stacey Hancock; Sara Mannheimer (2020). Designing data science workshops for data-intensive environmental science research [Dataset]. http://doi.org/10.5061/dryad.7wm37pvp7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 8, 2020
    Dataset provided by
    California State Polytechnic University
    Montana State University
    Authors
    Allison Theobold; Stacey Hancock; Sara Mannheimer
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.

    Methods Surveys from Carpentries style workshops the results of which are presented in the accompanying manuscript.

    Pre- and post-workshop surveys for each workshop (Introduction to R, Intermediate R, Data Wrangling in R, Data Visualization in R) were collected via Google Form.

    The surveys administered for the fall 2018, spring 2019 academic year are included as pre_workshop_survey and post_workshop_assessment PDF files. 
    The raw versions of these data are included in the Excel files ending in survey_raw or assessment_raw.
    
      The data files whose name includes survey contain raw data from pre-workshop surveys and the data files whose name includes assessment contain raw data from the post-workshop assessment survey.
    
    
    The annotated RMarkdown files used to clean the pre-workshop surveys and post-workshop assessments are included as workshop_survey_cleaning and workshop_assessment_cleaning, respectively. 
    The cleaned pre- and post-workshop survey data are included in the Excel files ending in clean. 
    The summaries and visualizations presented in the manuscript are included in the analysis annotated RMarkdown file.
    
  16. Fatality Analysis Reporting System ( FARS ) - FTP Raw Data

    • catalog.data.gov
    • data.transportation.gov
    • +1more
    Updated May 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Highway Traffic Safety Administration (2024). Fatality Analysis Reporting System ( FARS ) - FTP Raw Data [Dataset]. https://catalog.data.gov/dataset/fatality-analysis-reporting-system-fars-ftp-raw-data
    Explore at:
    Dataset updated
    May 1, 2024
    Description

    The program collects data for analysis of traffic safety crashes to identify problems, and evaluate countermeasures leading to reducing injuries and property damage resulting from motor vehicle crashes. The FARS dataset contains descriptions, in standard format, of each fatal crash reported. To qualify for inclusion, a crash must involve a motor vehicle traveling a traffic-way customarily open to the public and resulting in the death of a person (occupant of a vehicle or a non-motorist) within 30 days of the crash. Each crash has more than 100 coded data elements that characterize the crash, the vehicles, and the people involved. The specific data elements may be changed slightly each year to conform to the changing user needs, vehicle characteristics and highway safety emphasis areas. The type of information that FARS, a major application, processes is therefore motor vehicle crash data.

  17. BCG Data Science Simulation

    • kaggle.com
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PAVITR KUMAR SWAIN (2025). BCG Data Science Simulation [Dataset]. https://www.kaggle.com/datasets/pavitrkumar/bcg-data-science-simulation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PAVITR KUMAR SWAIN
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    ** Feature Engineering for Churn Prediction**

    🚀**# BCG Data Science Job Simulation | Forage** This notebook focuses on feature engineering techniques to enhance a dataset for churn prediction modeling. As part of the BCG Data Science Job Simulation, I transformed raw customer data into valuable features to improve predictive performance.

    📊 What’s Inside? ✅ Data Cleaning: Removing irrelevant columns to reduce noise ✅ Date-Based Feature Extraction: Converting raw dates into useful insights like activation year, contract length, and renewal month ✅ New Predictive Features:

    consumption_trend → Measures if a customer’s last-month usage is increasing or decreasing total_gas_and_elec → Aggregates total energy consumption ✅ Final Processed Dataset: Ready for churn prediction modeling

    📂Dataset Used: 📌 clean_data_after_eda.csv → Original dataset after Exploratory Data Analysis (EDA) 📌 clean_data_with_new_features.csv → Final dataset after feature engineering

    🛠 Technologies Used: 🔹 Python (Pandas, NumPy) 🔹 Data Preprocessing & Feature Engineering

    🌟 Why Feature Engineering? Feature engineering is one of the most critical steps in machine learning. Well-engineered features improve model accuracy and uncover deeper insights into customer behavior.

    🚀 This notebook is a great reference for anyone learning data preprocessing, feature selection, and predictive modeling in Data Science!

    📩 Connect with Me: 🔗 GitHub Repo: https://github.com/Pavitr-Swain/BCG-Data-Science-Job-Simulation 💼 LinkedIn: https://www.linkedin.com/in/pavitr-kumar-swain-ab708b227/

    🔍 Let’s explore churn prediction insights together! 🎯

  18. D

    Data Analysis and Reporting Service Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Analysis and Reporting Service Report [Dataset]. https://www.marketresearchforecast.com/reports/data-analysis-and-reporting-service-30436
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 9, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global market for Data Analysis and Reporting Services is experiencing robust growth, driven by the increasing need for data-driven decision-making across diverse industries. The market's expansion is fueled by several key factors: the proliferation of big data, advancements in artificial intelligence (AI) and machine learning (ML) capabilities within analytics platforms, and a rising demand for real-time insights. Businesses across sectors, from finance and healthcare to retail and manufacturing, are increasingly leveraging data analysis and reporting services to optimize operations, improve customer experience, and gain a competitive edge. The adoption of cloud-based solutions is further accelerating market growth, offering scalability, cost-effectiveness, and enhanced accessibility. While the market shows significant promise, challenges remain, including the need for skilled data analysts and the complexity of integrating disparate data sources. Data security and privacy concerns also pose significant hurdles that need to be addressed for continued, sustainable growth. Segment-wise, Business Intelligence (BI) platforms and data visualization tools dominate the market, fueled by their ability to transform raw data into actionable insights. The healthcare and life sciences sectors are particularly strong adopters, leveraging data analysis for patient care improvement, drug discovery, and research. Geographically, North America currently holds a significant market share, owing to its advanced technological infrastructure and high adoption rate of data analytics solutions. However, Asia Pacific is projected to witness substantial growth in the coming years, driven by increasing digitalization and a burgeoning middle class. Competitive intensity is high, with established players like Tableau, Microsoft Power BI, and Qlik facing competition from emerging cloud-based solutions and specialized analytics firms. The market is expected to continue its upward trajectory, with consistent growth projected throughout the forecast period, albeit at a potentially moderating rate as the market matures.

  19. Data analysis method test raw data

    • figshare.com
    • search.datacite.org
    pdf
    Updated May 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Miguel Carona Ferreira; Robert Huhle (2021). Data analysis method test raw data [Dataset]. http://doi.org/10.6084/m9.figshare.14672148.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 25, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jorge Miguel Carona Ferreira; Robert Huhle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data analysis raw data in a PDF file

  20. D

    Data Analysis and Reporting Service Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Analysis and Reporting Service Report [Dataset]. https://www.marketresearchforecast.com/reports/data-analysis-and-reporting-service-30438
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Mar 9, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Analysis and Reporting Services market is experiencing robust growth, driven by the increasing volume and complexity of data generated across various industries. The market's expansion is fueled by the rising adoption of cloud-based solutions, advanced analytics techniques like machine learning and AI, and the growing demand for real-time data insights to support better decision-making. Key segments within this market include Business Intelligence (BI) platforms, data visualization tools, and specialized applications across sectors such as business and finance, healthcare, retail, and manufacturing. The competitive landscape is characterized by a mix of established players like Tableau, Microsoft Power BI, and Qlik, alongside emerging niche providers. While North America currently holds a significant market share, regions like Asia Pacific are exhibiting rapid growth, driven by increasing digitalization and technological advancements. The market's trajectory is expected to remain positive throughout the forecast period, with continued innovation in data analysis technologies and expanding adoption across diverse industries contributing to its expansion. The sustained growth is further amplified by the increasing need for data-driven strategies across organizations of all sizes. Businesses are increasingly recognizing the value of converting raw data into actionable insights for improved operational efficiency, enhanced customer experience, and strategic planning. This necessitates investments in sophisticated data analysis and reporting services, fueling the demand for both software and services. However, challenges such as data security concerns, the need for skilled data analysts, and the complexity of integrating diverse data sources represent potential restraints to market growth. Nevertheless, ongoing technological advancements and the development of user-friendly tools are mitigating these challenges, ensuring the continued expansion of this vital market segment. This market will continue its upward trajectory, driven by factors such as big data proliferation, cloud computing adoption, and the ever-increasing need for data-driven decision-making across all sectors.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Adriana Tomic; Adriana Tomic; Ivan Tomic; Ivan Tomic (2020). Raw data from datasets used in SIMON analysis [Dataset]. http://doi.org/10.5281/zenodo.2580414
Organization logo

Raw data from datasets used in SIMON analysis

Explore at:
binAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adriana Tomic; Adriana Tomic; Ivan Tomic; Ivan Tomic
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Here you can find raw data and information about each of the 34 datasets generated by the mulset algorithm and used for further analysis in SIMON.
Each dataset is stored in separate folder which contains 4 files:

json_info: This file contains, number of features with their names and number of subjects that are available for the same dataset
data_testing: data frame with data used to test trained model
data_training: data frame with data used to train models
results: direct unfiltered data from database

Files are written in feather format. Here is an example of data structure for each file in repository.

File was compressed using 7-Zip available at https://www.7-zip.org/.

Search
Clear search
Close search
Google apps
Main menu