100+ datasets found
  1. D

    Data Preparation Tools Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Data Preparation Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/data-preparation-tools-51852
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    AMA Research & Media LLP
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Preparation Tools market is experiencing robust growth, projected to reach a market size of $3 billion in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 17.7% from 2025 to 2033. This significant expansion is driven by several key factors. The increasing volume and velocity of data generated across industries necessitates efficient and effective data preparation processes to ensure data quality and usability for analytics and machine learning initiatives. The rising adoption of cloud-based solutions, coupled with the growing demand for self-service data preparation tools, is further fueling market growth. Businesses across various sectors, including IT and Telecom, Retail and E-commerce, BFSI (Banking, Financial Services, and Insurance), and Manufacturing, are actively seeking solutions to streamline their data pipelines and improve data governance. The diverse range of applications, from simple data cleansing to complex data transformation tasks, underscores the versatility and broad appeal of these tools. Leading vendors like Microsoft, Tableau, and Alteryx are continuously innovating and expanding their product offerings to meet the evolving needs of the market, fostering competition and driving further advancements in data preparation technology. This rapid growth is expected to continue, driven by ongoing digital transformation initiatives and the increasing reliance on data-driven decision-making. The segmentation of the market into self-service and data integration tools, alongside the varied applications across different industries, indicates a multifaceted and dynamic landscape. While challenges such as data security concerns and the need for skilled professionals exist, the overall market outlook remains positive, projecting substantial expansion throughout the forecast period. The adoption of advanced technologies like artificial intelligence (AI) and machine learning (ML) within data preparation tools promises to further automate and enhance the process, contributing to increased efficiency and reduced costs for businesses. The competitive landscape is dynamic, with established players alongside emerging innovators vying for market share, leading to continuous improvement and innovation within the industry.

  2. D

    Data Preparation Tool Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Data Preparation Tool Market Report [Dataset]. https://www.promarketreports.com/reports/data-preparation-tool-market-18555
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global data preparation tool market is estimated to be valued at $674.52 million in 2025, with a compound annual growth rate (CAGR) of 16.46% from 2025 to 2033. The rising need to manage and analyze large volumes of complex data from various sources is driving the growth of the market. Additionally, the increasing adoption of cloud-based data management solutions and the growing demand for data-driven decision-making are contributing to the market's expansion. Key market trends include the growing adoption of artificial intelligence (AI) and machine learning (ML) technologies for data preparation automation, the increasing use of data visualization tools for data analysis, and the growing popularity of data fabric architectures for data integration and management. The market is segmented by deployment (on-premises, cloud, hybrid), data volume (small data, big data), data type (structured data, unstructured data, semi-structured data), industry vertical (BFSI, healthcare, retail, manufacturing), and use case (data integration, data cleansing, data transformation, data enrichment). North America is the largest regional market, followed by Europe and Asia Pacific. IBM, Collibra, Talend, Microsoft, Informatica, SAP, SAS Institute, and Denodo are some of the key players in the market. Key drivers for this market are: Cloud-based deployment AIML integration Self-service capabilities Real-time data processing Data governance and compliance. Potential restraints include: Increasing cloud adoption Growing volume of data Advancements in artificial intelligence (AI) and machine learning (ML) Stringent regulatory compliance Rising demand for self-service data preparation.

  3. D

    Data Preparation Software Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Data Preparation Software Report [Dataset]. https://www.archivemarketresearch.com/reports/data-preparation-software-50803
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Feb 23, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global data preparation software market is estimated at USD 579.3 million in 2025 and is expected to witness a compound annual growth rate (CAGR) of 8.1% from 2025 to 2033. Factors such as increasing data volumes, growing demand for data-driven insights, and the adoption of artificial intelligence (AI) and machine learning (ML) technologies are driving the growth of the market. Additionally, the rising need for data privacy and security regulations is also contributing to the demand for data preparation software. The market is segmented by application into large enterprises and SMEs, and by type into cloud-based and web-based. The cloud-based segment is expected to hold the largest market share during the forecast period due to its benefits such as ease of use, scalability, and cost-effectiveness. The market is also segmented by region into North America, South America, Europe, the Middle East and Africa, and Asia Pacific. North America is expected to account for the largest market share, followed by Europe. The Asia Pacific region is expected to witness the fastest growth during the forecast period. Key players in the market include Alteryx, Altair Monarch, Tableau Prep, Datameer, IBM, Oracle, Palantir Foundry, Podium, SAP, Talend, Trifacta, Unifi, and others. Data preparation software tools assist organizations in transforming raw data into a usable format for analysis, reporting, and storage. In 2023, the market size is expected to exceed $10 billion, driven by the growing adoption of AI, cloud computing, and machine learning technologies.

  4. D

    Data Preparation Platform Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Preparation Platform Report [Dataset]. https://www.marketresearchforecast.com/reports/data-preparation-platform-21037
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Market Analysis: The global data preparation platform market size was valued at USD XXX million in 2025 and is projected to reach USD XX million by 2033, exhibiting a CAGR of XX% during the forecast period. This growth is primarily driven by the increasing demand for data analytics and the need for efficient data preparation processes. The adoption of cloud-based deployments, advancements in artificial intelligence and machine learning, and the growing adoption of data preparation self-service tools are also contributing to market expansion. Key Market Trends: The market is segmented by type (cloud-based and on-premise) and application (large enterprises and small & medium enterprises). Cloud-based solutions are expected to dominate the market due to their scalability, flexibility, and cost-effectiveness. Large enterprises are expected to be the primary users of data preparation platforms due to their extensive data volumes and need for data integration and analysis. Leading vendors in the market include Microsoft, Tableau, Trifacta, and Alteryx. The competitive landscape is expected to intensify as new entrants emerge and established players enhance their offerings. Regional markets, including North America, Europe, Asia Pacific, and the Middle East & Africa, are expected to offer significant growth opportunities.

  5. Data Science Platform Market Analysis North America, Europe, APAC, South...

    • technavio.com
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Data Science Platform Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, Germany, China, Canada, UK, India, France, Japan, Brazil, UAE - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/data-science-platform-market-industry-analysis
    Explore at:
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global, United States, United Kingdom
    Description

    Snapshot img

    Data Science Platform Market Size 2025-2029

    The data science platform market size is forecast to increase by USD 763.9 million at a CAGR of 40.2% between 2024 and 2029.

    The market is experiencing significant growth, driven by the integration of artificial intelligence (AI) and machine learning (ML). This enhancement enables more advanced data analysis and prediction capabilities, making data science platforms an essential tool for businesses seeking to gain insights from their data. Another trend shaping the market is the emergence of containerization and microservices in platforms. This development offers increased flexibility and scalability, allowing organizations to efficiently manage their projects. 
    However, the use of platforms also presents challenges, particularly In the area of data privacy and security. Ensuring the protection of sensitive data is crucial for businesses, and platforms must provide strong security measures to mitigate risks. In summary, the market is witnessing substantial growth due to the integration of AI and ML technologies, containerization, and microservices, while data privacy and security remain key challenges.
    

    What will be the Size of the Data Science Platform Market During the Forecast Period?

    Request Free Sample

    The market is experiencing significant growth due to the increasing demand for advanced data analysis capabilities in various industries. Cloud-based solutions are gaining popularity as they offer scalability, flexibility, and cost savings. The market encompasses the entire project life cycle, from data acquisition and preparation to model development, training, and distribution. Big data, IoT, multimedia, machine data, consumer data, and business data are prime sources fueling this market's expansion. Unstructured data, previously challenging to process, is now being effectively managed through tools and software. Relational databases and machine learning models are integral components of platforms, enabling data exploration, preprocessing, and visualization.
    Moreover, Artificial intelligence (AI) and machine learning (ML) technologies are essential for handling complex workflows, including data cleaning, model development, and model distribution. Data scientists benefit from these platforms by streamlining their tasks, improving productivity, and ensuring accurate and efficient model training. The market is expected to continue its growth trajectory as businesses increasingly recognize the value of data-driven insights.
    

    How is this Data Science Platform Industry segmented and which is the largest segment?

    The industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      On-premises
      Cloud
    
    
    Component
    
      Platform
      Services
    
    
    End-user
    
      BFSI
      Retail and e-commerce
      Manufacturing
      Media and entertainment
      Others
    
    
    Sector
    
      Large enterprises
      SMEs
    
    
    Geography
    
      North America
    
        Canada
        US
    
    
      Europe
    
        Germany
        UK
        France
    
    
      APAC
    
        China
        India
        Japan
    
    
      South America
    
        Brazil
    
    
      Middle East and Africa
    

    By Deployment Insights

    The on-premises segment is estimated to witness significant growth during the forecast period.
    

    On-premises deployment is a traditional method for implementing technology solutions within an organization. This approach involves purchasing software with a one-time license fee and a service contract. On-premises solutions offer enhanced security, as they keep user credentials and data within the company's premises. They can be customized to meet specific business requirements, allowing for quick adaptation. On-premises deployment eliminates the need for third-party providers to manage and secure data, ensuring data privacy and confidentiality. Additionally, it enables rapid and easy data access, and keeps IP addresses and data confidential. This deployment model is particularly beneficial for businesses dealing with sensitive data, such as those in manufacturing and large enterprises. While cloud-based solutions offer flexibility and cost savings, on-premises deployment remains a popular choice for organizations prioritizing data security and control.

    Get a glance at the Data Science Platform Industry report of share of various segments. Request Free Sample

    The on-premises segment was valued at USD 38.70 million in 2019 and showed a gradual increase during the forecast period.

    Regional Analysis

    North America is estimated to contribute 48% to the growth of the global market during the forecast period.
    

    Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

    For more insights on the market share of various regions, Request F

  6. Global Data Prep Market By Platform (Self-Service Data Prep, Data...

    • verifiedmarketresearch.com
    Updated Sep 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Data Prep Market By Platform (Self-Service Data Prep, Data Integration), By Tools (Data Curation, Data Cataloging, Data Quality, Data Ingestion, Data Governance), By Geographic Scope and Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/data-prep-market/
    Explore at:
    Dataset updated
    Sep 29, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Data Prep Market size was valued at USD 4.02 Billion in 2024 and is projected to reach USD 16.12 Billion by 2031, growing at a CAGR of 19% from 2024 to 2031.

    Global Data Prep Market Drivers

    Increasing Demand for Data Analytics: Businesses across all industries are increasingly relying on data-driven decision-making, necessitating the need for clean, reliable, and useful information. This rising reliance on data increases the demand for better data preparation technologies, which are required to transform raw data into meaningful insights.
    Growing Volume and Complexity of Data: The increase in data generation continues unabated, with information streaming in from a variety of sources. This data frequently lacks consistency or organization, therefore effective data preparation is critical for accurate analysis. To assure quality and coherence while dealing with such a large and complicated data landscape, powerful technologies are required.
    Increased Use of Self-Service Data Preparation Tools: User-friendly, self-service data preparation solutions are gaining popularity because they enable non-technical users to access, clean, and prepare data. independently. This democratizes data access, decreases reliance on IT departments, and speeds up the data analysis process, making data-driven insights more available to all business units.
    Integration of AI and ML: Advanced data preparation technologies are progressively using AI and machine learning capabilities to improve their effectiveness. These technologies automate repetitive activities, detect data quality issues, and recommend data transformations, increasing productivity and accuracy. The use of AI and ML streamlines the data preparation process, making it faster and more reliable.
    Regulatory Compliance Requirements: Many businesses are subject to tight regulations governing data security and privacy. Data preparation technologies play an important role in ensuring that data meets these compliance requirements. By giving functions that help manage and protect sensitive information these technologies help firms negotiate complex regulatory climates.
    Cloud-based Data Management: The transition to cloud-based data storage and analytics platforms needs data preparation solutions that can work smoothly with cloud-based data sources. These solutions must be able to integrate with a variety of cloud settings to assist effective data administration and preparation while also supporting modern data infrastructure.

  7. D

    Data Prep Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Prep Report [Dataset]. https://www.datainsightsmarket.com/reports/data-prep-1977265
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Jan 21, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global data preparation market is anticipated to escalate by 14.3% CAGR from 2023 to 2033, amassing a value of USD 2210.8 million by 2033. With enterprises generating massive volumes of data, data preparation has become crucial for effective data analysis and decision-making. Driving this market growth are the increasing adoption of cloud-based data storage and processing platforms, the need for data privacy and governance, and the growing use of artificial intelligence (AI) and machine learning (ML) in data analysis. Market segmentation includes different applications such as hosted and on-premises, and types such as data curation, cataloging, quality, ingestion, and governance. Key market players include Alteryx, Inc., Informatica, IBM, Tibco Software Inc., Microsoft, and SAS Institute. Regionally, the market is segmented into North America, South America, Europe, the Middle East & Africa, and Asia Pacific. Factors restraining market growth include data privacy concerns and the lack of skilled professionals in data preparation. However, technological advancements, such as the integration of AI and ML in data preparation tools, are expected to create growth opportunities in the future.

  8. Data from: Enriching time series datasets using Nonparametric kernel...

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Mohamad Ivan Fanany
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.

  9. Data from: Benchmarking Machine Learning Methods for Synthetic Lethality...

    • zenodo.org
    bin
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Feng Yimiao; Long Yahui; Wang He; Ouyang Yang; Li Quan; Wu Min; Zheng Jie; Feng Yimiao; Long Yahui; Wang He; Ouyang Yang; Li Quan; Wu Min; Zheng Jie (2024). Benchmarking Machine Learning Methods for Synthetic Lethality Prediction in Cancer [Dataset]. http://doi.org/10.5281/zenodo.13691648
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Feng Yimiao; Long Yahui; Wang He; Ouyang Yang; Li Quan; Wu Min; Zheng Jie; Feng Yimiao; Long Yahui; Wang He; Ouyang Yang; Li Quan; Wu Min; Zheng Jie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Benchmarking of Machine Learning Methods for Predicting Synthetic Lethality Interactions

    Data Preparation and Download Instructions

    After downloading data from this project, follow these steps to prepare the training data:

    Step 1: Download all the data parts from the [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13691648.svg)](https://doi.org/10.5281/zenodo.13691648) provided in the repository.

    Matters needing attention:

    • The actual command will depend on how you're downloading files from Google Drive. For the convenience of downloading, the files have been compressed and split.
    • We have prepared two versions of data, namely the complete version (`data_large.tar.gz`) and the version without PiLSL database (`data_small.tar.gz`).
    • The decompressed size of the complete version is about 90GB, while the decompressed size of the version without PiLSL database is about 22GB.
    • (Due to the extremely time-consuming database construction process, it is recommended to download the complete version of the data. However, the data version without PiLSL database can successfully run all 10 models except PiLSL.)

    Step 2: Combine the parts into a single archive.

    cat data_large.tar.gz.part* > data_large.tar.gz   # Complete version, the size after extracted is about 100GB.
    # cat data_small.tar.gz.part* > data_small.tar.gz  # The version without PiLSL database, the size after extracted is about 25GB.

    Step 3: Verify the integrity of the downloaded files.

    md5sum -c data_large.tar.gz.md5
    # md5sum -c data_small.tar.gz.md5 # The version without PiLSL database

    Step 4: Extract the dataset.

    tar -xzvf data_large.tar.gz
    # tar -xzvf data_small.tar.gz # The version without PiLSL database
  10. m

    A dataset for machine learning research in the field of stress analyses of...

    • data.mendeley.com
    • narcis.nl
    Updated Jul 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaroslav Matej (2020). A dataset for machine learning research in the field of stress analyses of mechanical structures [Dataset]. http://doi.org/10.17632/wzbzznk8z3.2
    Explore at:
    Dataset updated
    Jul 25, 2020
    Authors
    Jaroslav Matej
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is prepared and intended as a data source for development of a stress analysis method based on machine learning. It consists of finite element stress analyses of randomly generated mechanical structures. The dataset contains more than 270,794 pairs of stress analyses images (von Mises stress) of randomly generated 2D structures with predefined thickness and material properties. All the structures are fixed at their bottom edges and loaded with gravity force only. See PREVIEW directory with some examples. The zip file contains all the files in the dataset.

  11. i

    Dataset of article: Synthetic Datasets Generator for Testing Information...

    • ieee-dataport.org
    Updated Mar 13, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandro Mendonça (2020). Dataset of article: Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools [Dataset]. http://doi.org/10.21227/5aeq-rr34
    Explore at:
    Dataset updated
    Mar 13, 2020
    Dataset provided by
    IEEE Dataport
    Authors
    Sandro Mendonça
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.

  12. d

    Summary report of the 4th IAEA Technical Meeting on Fusion Data Processing,...

    • dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S.M. Gonzalez de Vicente, D. Mazon, M. Xu, S. Pinches, M. Churchill, A. Dinklage, R. Fischer, A. Murari, P. Rodriguez-Fernandez, J. Stillerman, J. Vega, G. Verdoolaege (2024). Summary report of the 4th IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis (FDPVA) [Dataset]. https://dataone.org/datasets/sha256%3A77e8a13f897db6e70537f800ceea9a2c8228a4eb1b33289b7e2c056845298a12
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    S.M. Gonzalez de Vicente, D. Mazon, M. Xu, S. Pinches, M. Churchill, A. Dinklage, R. Fischer, A. Murari, P. Rodriguez-Fernandez, J. Stillerman, J. Vega, G. Verdoolaege
    Description

    The objective of the fourth Technical Meeting on Fusion Data Processing, Validation and Analysis was to provide a platform during which a set of topics relevant to fusion data processing, validation and analysis are discussed with the view of extrapolating needs to next step fusion devices such as ITER. The validation and analysis of experimental data obtained from diagnostics used to characterize fusion plasmas are crucial for a knowledge-based understanding of the physical processes governing the dynamics of these plasmas. This paper presents the recent progress and achievements in the domain of plasma diagnostics and synthetic diagnostics data analysis (including image processing, regression analysis, inverse problems, deep learning, machine learning, big data and physics-based models for control) reported at the meeting. The progress in these areas highlight trends observed in current major fusion confinement devices. A special focus is dedicated on data analysis requirements for ITER and DEMO with a particular attention paid to Artificial Intelligence for automatization and improving reliability of control processes.

  13. Demo dataset for: SPACEc, a streamlined, interactive Python workflow for...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuqi Tan; Tim Kempchen (2024). Demo dataset for: SPACEc, a streamlined, interactive Python workflow for multiplexed image processing and analysis [Dataset]. http://doi.org/10.5061/dryad.brv15dvj1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Stanford University School of Medicine
    Authors
    Yuqi Tan; Tim Kempchen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Multiplexed imaging technologies provide insights into complex tissue architectures. However, challenges arise due to software fragmentation with cumbersome data handoffs, inefficiencies in processing large images (8 to 40 gigabytes per image), and limited spatial analysis capabilities. To efficiently analyze multiplexed imaging data, we developed SPACEc, a scalable end-to-end Python solution, that handles image extraction, cell segmentation, and data preprocessing and incorporates machine-learning-enabled, multi-scaled, spatial analysis, operated through a user-friendly and interactive interface. The demonstration dataset was derived from a previous analysis and contains TMA cores from a human tonsil and tonsillitis sample that were acquired with the Akoya PhenocyclerFusion platform. The dataset can be used to test the workflow and establish it on a user’s system or to familiarize oneself with the pipeline. Methods Tissue samples: Tonsil cores were extracted from a larger multi-tumor tissue microarray (TMA), which included a total of 66 unique tissues (51 malignant and semi-malignant tissues, as well as 15 non-malignant tissues). Representative tissue regions were annotated on corresponding hematoxylin and eosin (H&E)-stained sections by a board-certified surgical pathologist (S.Z.). Annotations were used to generate the 66 cores each with cores of 1mm diameter. FFPE tissue blocks were retrieved from the tissue archives of the Institute of Pathology, University Medical Center Mainz, Germany, and the Department of Dermatology, University Medical Center Mainz, Germany. The multi-tumor-TMA block was sectioned at 3µm thickness onto SuperFrost Plus microscopy slides before being processed for CODEX multiplex imaging as previously described. CODEX multiplexed imaging and processing To run the CODEX machine, the slide was taken from the storage buffer and placed in PBS for 10 minutes to equilibrate. After drying the PBS with a tissue, a flow cell was sealed onto the tissue slide. The assembled slide and flow cell were then placed in a PhenoCycler Buffer made from 10X PhenoCycler Buffer & Additive for at least 10 minutes before starting the experiment. A 96-well reporter plate was prepared with each reporter corresponding to the correct barcoded antibody for each cycle, with up to 3 reporters per cycle per well. The fluorescence reporters were mixed with 1X PhenoCycler Buffer, Additive, nuclear-staining reagent, and assay reagent according to the manufacturer's instructions. With the reporter plate and assembled slide and flow cell placed into the CODEX machine, the automated multiplexed imaging experiment was initiated. Each imaging cycle included steps for reporter binding, imaging of three fluorescent channels, and reporter stripping to prepare for the next cycle and set of markers. This was repeated until all markers were imaged. After the experiment, a .qptiff image file containing individual antibody channels and the DAPI channel was obtained. Image stitching, drift compensation, deconvolution, and cycle concatenation are performed within the Akoya PhenoCycler software. The raw imaging data output (tiff, 377.442nm per pixel for 20x CODEX) is first examined with QuPath software (https://qupath.github.io/) for inspection of staining quality. Any markers that produce unexpected patterns or low signal-to-noise ratios should be excluded from the ensuing analysis. The qptiff files must be converted into tiff files for input into SPACEc. Data preprocessing includes image stitching, drift compensation, deconvolution, and cycle concatenation performed using the Akoya Phenocycler software. The raw imaging data (qptiff, 377.442 nm/pixel for 20x CODEX) files from the Akoya PhenoCycler technology were first examined with QuPath software (https://qupath.github.io/) to inspect staining qualities. Markers with untenable patterns or low signal-to-noise ratios were excluded from further analysis. A custom CODEX analysis pipeline was used to process all acquired CODEX data (scripts available upon request). The qptiff files were converted into tiff files for tissue detection (watershed algorithm) and cell segmentation.

  14. I

    Words_Selected_by_Information_Gain

    • databank.illinois.edu
    Updated Jan 2, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaoru Dong; Jingyi Xie; Linh Hoang (2019). Words_Selected_by_Information_Gain [Dataset]. http://doi.org/10.13012/B2IDB-9837167_V1
    Explore at:
    Dataset updated
    Jan 2, 2019
    Authors
    Xiaoru Dong; Jingyi Xie; Linh Hoang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    U.S. National Institutes of Health (NIH)
    Description

    File Name: WordsSelectedByInformationGain.csv Data Preparation: Xiaoru Dong, Linh Hoang Date of Preparation: 2018-12-12 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: the file contains a list of 1655 informative words selected by applying information gain feature selection strategy. Information gain is one of the methods commonly used for feature selection, which tells us how many bits of information the presence of the word are helpful for us to predict the classes, and can be computed in a specific formula [Jurafsky D, Martin JH. Speech and language processing. London: Pearson; 2014 Dec 30].We ran Information Gain feature selection on Weka -- a machine learning tool. Notes: In order to reproduce the data in this file, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.

  15. m

    Austin_Survey_for_MDCOR_Analyses

    • data.mendeley.com
    Updated Nov 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Gonzalez Canche (2022). Austin_Survey_for_MDCOR_Analyses [Dataset]. http://doi.org/10.17632/nb7yvhjvzk.1
    Explore at:
    Dataset updated
    Nov 14, 2022
    Authors
    Manuel Gonzalez Canche
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Austin
    Description

    The city of Austin has administered a community survey for the 2015, 2016, 2017, 2018 and 2019 years (https://data.austintexas.gov/City-Government/Community-Survey/s2py-ceb7), to “assess satisfaction with the delivery of the major City Services and to help determine priorities for the community as part of the City’s ongoing planning process.” To directly access this dataset from the city of Austin’s website, you can follow this link https://cutt.ly/VNqq5Kd. Although we downloaded the dataset analyzed in this study from the former link, given that the city of Austin is interested in continuing administering this survey, there is a chance that the data we used for this analysis and the data hosted in the city of Austin’s website may differ in the following years. Accordingly, to ensure the replication of our findings, we recommend researchers to download and analyze the dataset we employed in our analyses, which can be accessed at the following link https://github.com/democratizing-data-science/MDCOR/blob/main/Community_Survey.csv. Replication Features or Variables The community survey data has 10,684 rows and 251 columns. Of these columns, our analyses will rely on the following three indicators that are taken verbatim from the survey: “ID”, “Q25 - If there was one thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?", and “Do you own or rent your home?”

  16. Dollar street 10 - 64x64x3

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Apr 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven van der burg; Sven van der burg (2024). Dollar street 10 - 64x64x3 [Dataset]. http://doi.org/10.5281/zenodo.10970014
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 14, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sven van der burg; Sven van der burg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MLCommons Dollar Street Dataset is a collection of images of everyday household items from homes around the world that visually captures socioeconomic diversity of traditionally underrepresented populations. It consists of public domain data, licensed for academic, commercial and non-commercial usage, under CC-BY and CC-BY-SA 4.0. The dataset was developed because similar datasets lack socioeconomic metadata and are not representative of global diversity.

    This is a subset of the original dataset that can be used for multiclass classification with 10 categories. It is designed to be used in teaching, similar to the widely used, but unlicensed CIFAR-10 dataset.

    These are the preprocessing steps that were performed:

    1. Only take examples with one imagenet_synonym label
    2. Use only examples with the 10 most frequently occuring labels
    3. Downscale images to 64 x 64 pixels
    4. Split data in train and test
    5. Store as numpy array

    This is the label mapping:

    Categorylabel
    day bed0
    dishrag1
    plate2
    running shoe3
    soap dispenser4
    street sign5
    table lamp6
    tile roof7
    toilet seat8
    washing machine9

    Checkout this notebook to see how the subset was created.

    The original dataset was downloaded from https://www.kaggle.com/datasets/mlcommons/the-dollar-street-dataset. See https://mlcommons.org/datasets/dollar-street/ for more information.

  17. A

    Artificial Intelligence (AI) in Manufacturing Market Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Artificial Intelligence (AI) in Manufacturing Market Report [Dataset]. https://www.marketresearchforecast.com/reports/artificial-intelligence-ai-in-manufacturing-market-2215
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 7, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Artificial Intelligence (AI) in Manufacturing Market size was valued at USD 1.82 USD billion in 2023 and is projected to reach USD 6.64 USD billion by 2032, exhibiting a CAGR of 20.3 % during the forecast period. AI in manufacturing is the technology using intelligent systems and algorithms in industrial settings for the improvement of productivity and decision making. It uses machine learning, robotics, and analytics to optimize manufacturing operations. Industrial areas of applications are supplying chain management (SCM), predictive maintenance (PM), quality control (QC), and autonomous robotics (AR). AI systems in manufacturing can be classified the following ways: supervised learning for predictive maintenance, unsupervised learning for anomaly detection, reinforcement learning for autonomous robotics, and natural language processing for human-machine interaction. A crucial part of this system includes sensors for data gathering, data processing systems, machine learning systems, robotics, and human-machine interfaces. Right now, trendsetting technologies such as AI with IoT for real-time monitoring, explainable AI for transparency, and AI-driven generative design for product innovation are the most important ingredients for the progress of the technology. Companies experiment with AI enabled replicas of the manufacturing process and AI based supply chains that enables them to be more efficient and resilient. Recent developments include: Microsoft and Siemens announce partnership to develop AI-powered manufacturing solutions

    Google and ABB collaborate on AI-based cloud solutions for industrial robotics

    IBM and Samsung join forces to advance AI for semiconductor manufacturing. Key drivers for this market are: Rising Demand from the Automotive and Construction Sectors to Aid Market Growth. Potential restraints include: The Change in International Policies is Expected to Impact the Market Growth .

  18. i

    Data from: HUMAN4D: A Human-Centric Multimodal Dataset for Motions &...

    • ieee-dataport.org
    Updated Aug 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anargyros Chatzitofis (2020). HUMAN4D: A Human-Centric Multimodal Dataset for Motions & Immersive Media [Dataset]. http://doi.org/10.21227/xjzb-4y45
    Explore at:
    Dataset updated
    Aug 20, 2020
    Dataset provided by
    IEEE Dataport
    Authors
    Anargyros Chatzitofis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and 2 male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data. Despite the existence of multi-view color datasets captured with the use of hardware (HW) synchronization, to the best of our knowledge, HUMAN4D is the first and only public resource that provides volumetric depth maps with high synchronization precision due to the use of intra- and inter-sensor HW-SYNC. Moreover, a spatio-temporally aligned scanned and rigged 3D character complements HUMAN4D to enable joint research on time-varying and high-quality dynamic meshes. We provide evaluation baselines by benchmarking HUMAN4D with state-of-the-art human pose estimation and 3D compression methods. For the former, we apply 2D and 3D pose estimation algorithms both on single- and multi-view data cues. For the latter, we benchmark open-source 3D codecs on volumetric data respecting online volumetric video encoding and steady bit-rates. Furthermore, qualitative and quantitative visual comparison between mesh-based volumetric data reconstructed in different qualities showcases the available options with respect to 4D representations. HUMAN4D is introduced to the computer vision and graphics research communities to enable joint research on spatio-temporally aligned pose, volumetric, mRGBD and audio data cues.The dataset and its code are available online.

  19. D

    Data Labeling Solution and Services Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Data Labeling Solution and Services Report [Dataset]. https://www.archivemarketresearch.com/reports/data-labeling-solution-and-services-52815
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Mar 7, 2025
    Dataset provided by
    AMA Research & Media LLP
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Data Labeling Solution and Services market is experiencing robust growth, driven by the increasing adoption of artificial intelligence (AI) and machine learning (ML) across diverse sectors. The market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching an estimated market value of $70 billion by 2033. This significant expansion is fueled by the burgeoning need for high-quality training data to enhance the accuracy and performance of AI models. Key growth drivers include the expanding application of AI in various industries like automotive (autonomous vehicles), healthcare (medical image analysis), and financial services (fraud detection). The increasing availability of diverse data types (text, image/video, audio) further contributes to market growth. However, challenges such as the high cost of data labeling, data privacy concerns, and the need for skilled professionals to manage and execute labeling projects pose certain restraints on market expansion. Segmentation by application (automotive, government, healthcare, financial services, others) and data type (text, image/video, audio) reveals distinct growth trajectories within the market. The automotive and healthcare sectors currently dominate, but the government and financial services segments are showing promising growth potential. The competitive landscape is marked by a mix of established players and emerging startups. Companies like Amazon Mechanical Turk, Appen, and Labelbox are leading the market, leveraging their expertise in crowdsourcing, automation, and specialized data labeling solutions. However, the market shows strong potential for innovation, particularly in the development of automated data labeling tools and the expansion of services into niche areas. Regional analysis indicates strong market penetration in North America and Europe, driven by early adoption of AI technologies and robust research and development efforts. However, Asia-Pacific is expected to witness significant growth in the coming years fueled by rapid technological advancements and a rising demand for AI solutions. Further investment in R&D focused on automation, improved data security, and the development of more effective data labeling methodologies will be crucial for unlocking the full potential of this rapidly expanding market.

  20. UniToBrain Dataset

    • zenodo.org
    • ieee-dataport.org
    bin, csv, pdf +1
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umberto Gava; Umberto Gava; Federico D'Agata; Federico D'Agata; Edwin Bennink; Edwin Bennink; Enzo Tartaglione; Enzo Tartaglione; Annamaria Vernone; Annamaria Vernone; Francesca Bertolino; Eleonora Ficiarà; Eleonora Ficiarà; Alessandro Cicerale; Alessandro Cicerale; Fabrizio Pizzagalli; Fabrizio Pizzagalli; Caterina Guiot; Caterina Guiot; Marco Grangetto; Marco Grangetto; Mauro Bergui; Mauro Bergui; Francesca Bertolino (2024). UniToBrain Dataset [Dataset]. http://doi.org/10.5281/zenodo.5046076
    Explore at:
    text/x-python, bin, pdf, csvAvailable download formats
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Umberto Gava; Umberto Gava; Federico D'Agata; Federico D'Agata; Edwin Bennink; Edwin Bennink; Enzo Tartaglione; Enzo Tartaglione; Annamaria Vernone; Annamaria Vernone; Francesca Bertolino; Eleonora Ficiarà; Eleonora Ficiarà; Alessandro Cicerale; Alessandro Cicerale; Fabrizio Pizzagalli; Fabrizio Pizzagalli; Caterina Guiot; Caterina Guiot; Marco Grangetto; Marco Grangetto; Mauro Bergui; Mauro Bergui; Francesca Bertolino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The University of Turin (UniTO) released the open-access dataset UniTOBrain collected for the homonymous Use Case 3 in the DeepHealth project (https://deephealth-project.eu/). UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP). The dataset includes 100 training subjects and 15 testing subjects used in a submitted publication for the training and the testing of a Convolutional Neural Network (CNN, see for details: https://arxiv.org/abs/2101.05992, https://paperswithcode.com/paper/neural-network-derived-perfusion-maps-a-model, https://www.medrxiv.org/content/10.1101/2021.01.13.21249757v1). At this stage, the UniTO team released this dataset privately, but soon it will be public. This is a subsample of a greater dataset of 258 subjects that will be soon available for download at https://ieee-dataport.org/.
    CTP data from 258 consecutive patients were retrospectively obtained from the hospital PACS of Città della Salute e della Scienza di Torino (Molinette). CTP acquisition parameters were as follows: Scanner GE, 64 slices, 80 kV, 150 mAs, 44.5 sec duration, 89 volumes (40 mm axial coverage), injection of 40 ml of Iodine contrast agent (300 mg/ml) at 4 ml/s speed.

    Along with the dataset, we provide some utility files.

    dicomtonpy.py: It converts the dicom files in the dataset to numpy arrays. These are 3D arrays, where CT slices at the same height are piled-up over the temporal acquisition.

    dataloader_pytorch.py: Dataloader for the pytorch deep learning framework. It converts the numpy arrays in normalized tensors, which can be provided as input to standard deep learning models.

    dataloader_pyeddl.py: Dataloader for the pyeddl deep learning framework. It converts the numpy arrays in normalized tensors, which can be provided as input to standard deep learning models using the european library EDDL. Visit https://github.com/EIDOSlab/UC3-UNITOBrain to have a full companion code where a U-Net model is trained over the dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
AMA Research & Media LLP (2025). Data Preparation Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/data-preparation-tools-51852

Data Preparation Tools Report

Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Mar 6, 2025
Dataset provided by
AMA Research & Media LLP
License

https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

The Data Preparation Tools market is experiencing robust growth, projected to reach a market size of $3 billion in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 17.7% from 2025 to 2033. This significant expansion is driven by several key factors. The increasing volume and velocity of data generated across industries necessitates efficient and effective data preparation processes to ensure data quality and usability for analytics and machine learning initiatives. The rising adoption of cloud-based solutions, coupled with the growing demand for self-service data preparation tools, is further fueling market growth. Businesses across various sectors, including IT and Telecom, Retail and E-commerce, BFSI (Banking, Financial Services, and Insurance), and Manufacturing, are actively seeking solutions to streamline their data pipelines and improve data governance. The diverse range of applications, from simple data cleansing to complex data transformation tasks, underscores the versatility and broad appeal of these tools. Leading vendors like Microsoft, Tableau, and Alteryx are continuously innovating and expanding their product offerings to meet the evolving needs of the market, fostering competition and driving further advancements in data preparation technology. This rapid growth is expected to continue, driven by ongoing digital transformation initiatives and the increasing reliance on data-driven decision-making. The segmentation of the market into self-service and data integration tools, alongside the varied applications across different industries, indicates a multifaceted and dynamic landscape. While challenges such as data security concerns and the need for skilled professionals exist, the overall market outlook remains positive, projecting substantial expansion throughout the forecast period. The adoption of advanced technologies like artificial intelligence (AI) and machine learning (ML) within data preparation tools promises to further automate and enhance the process, contributing to increased efficiency and reduced costs for businesses. The competitive landscape is dynamic, with established players alongside emerging innovators vying for market share, leading to continuous improvement and innovation within the industry.

Search
Clear search
Close search
Google apps
Main menu