Facebook
Twitterhttps://data.gov.tw/licensehttps://data.gov.tw/license
This project aims to use artificial intelligence to identify potential risk factors for damaged asphalt pavements under the road, explore the pre-processing procedures and steps of ground penetrating radar data, and propose initial solutions or recommendations for difficulties and problems encountered in the pre-processing process.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.
Facebook
Twitterhttps://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
| BASE YEAR | 2024 |
| HISTORICAL DATA | 2019 - 2023 |
| REGIONS COVERED | North America, Europe, APAC, South America, MEA |
| REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
| MARKET SIZE 2024 | 22.1(USD Billion) |
| MARKET SIZE 2025 | 25.8(USD Billion) |
| MARKET SIZE 2035 | 120.5(USD Billion) |
| SEGMENTS COVERED | Service Type, Deployment Model, End User, Application, Regional |
| COUNTRIES COVERED | US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA |
| KEY MARKET DYNAMICS | Growing demand for data integration, Increasing focus on automation, Rapid advancements in machine learning, Rising importance of data security, Expanding applications across industries |
| MARKET FORECAST UNITS | USD Billion |
| KEY COMPANIES PROFILED | IBM, Palantir Technologies, ServiceNow, Oracle, Zoho, NVIDIA, Salesforce, SAP, H2O.ai, Microsoft, Intel, Amazon, Google, C3.ai, Alteryx, DataRobot |
| MARKET FORECAST PERIOD | 2025 - 2035 |
| KEY MARKET OPPORTUNITIES | Increased demand for data management, Growth in machine learning applications, Expansion of IoT analytics, Rising need for predictive insights, Adoption of personalized marketing strategies |
| COMPOUND ANNUAL GROWTH RATE (CAGR) | 16.7% (2025 - 2035) |
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
US Deep Learning Market Size 2025-2029
The deep learning market size in US is forecast to increase by USD 5.02 billion at a CAGR of 30.1% between 2024 and 2029.
The deep learning market is experiencing robust growth, driven by the increasing adoption of artificial intelligence (AI) in various industries for advanced solutioning. This trend is fueled by the availability of vast amounts of data, which is a key requirement for deep learning algorithms to function effectively. Industry-specific solutions are gaining traction, as businesses seek to leverage deep learning for specific use cases such as image and speech recognition, fraud detection, and predictive maintenance. Alongside, intuitive data visualization tools are simplifying complex neural network outputs, helping stakeholders understand and validate insights.
However, challenges remain, including the need for powerful computing resources, data privacy concerns, and the high cost of implementing and maintaining deep learning systems. Despite these hurdles, the market's potential for innovation and disruption is immense, making it an exciting space for businesses to explore further. Semi-supervised learning, data labeling, and data cleaning facilitate efficient training of deep learning models. Cloud analytics is another significant trend, as companies seek to leverage cloud computing for cost savings and scalability.
What will be the Size of the market During the Forecast Period?
Request Free Sample
Deep learning, a subset of machine learning, continues to shape industries by enabling advanced applications such as image and speech recognition, text generation, and pattern recognition. Reinforcement learning, a type of deep learning, gains traction, with deep reinforcement learning leading the charge. Anomaly detection, a crucial application of unsupervised learning, safeguards systems against security vulnerabilities. Ethical implications and fairness considerations are increasingly important in deep learning, with emphasis on explainable AI and model interpretability. Graph neural networks and attention mechanisms enhance data preprocessing for sequential data modeling and object detection. Time series forecasting and dataset creation further expand deep learning's reach, while privacy preservation and bias mitigation ensure responsible use.
In summary, deep learning's market dynamics reflect a constant pursuit of innovation, efficiency, and ethical considerations. The Deep Learning Market in the US is flourishing as organizations embrace intelligent systems powered by supervised learning and emerging self-supervised learning techniques. These methods refine predictive capabilities and reduce reliance on labeled data, boosting scalability. BFSI firms utilize AI image recognition for various applications, including personalizing customer communication, maintaining a competitive edge, and automating repetitive tasks to boost productivity. Sophisticated feature extraction algorithms now enable models to isolate patterns with high precision, particularly in applications such as image classification for healthcare, security, and retail.
How is this market segmented and which is the largest segment?
The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Application
Image recognition
Voice recognition
Video surveillance and diagnostics
Data mining
Type
Software
Services
Hardware
End-user
Security
Automotive
Healthcare
Retail and commerce
Others
Geography
North America
US
By Application Insights
The Image recognition segment is estimated to witness significant growth during the forecast period. In the realm of artificial intelligence (AI) and machine learning, image recognition, a subset of computer vision, is gaining significant traction. This technology utilizes neural networks, deep learning models, and various machine learning algorithms to decipher visual data from images and videos. Image recognition is instrumental in numerous applications, including visual search, product recommendations, and inventory management. Consumers can take photographs of products to discover similar items, enhancing the online shopping experience. In the automotive sector, image recognition is indispensable for advanced driver assistance systems (ADAS) and autonomous vehicles, enabling the identification of pedestrians, other vehicles, road signs, and lane markings.
Furthermore, image recognition plays a pivotal role in augmented reality (AR) and virtual reality (VR) applications, where it tracks physical objects and overlays digital content onto real-world scenarios. The model training process involves the backpropagation algorithm, which calculates the loss fu
Facebook
Twitterhttps://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Data pre-processing and clean-up of Data Mining, 6th Semester , B.Tech in Computer Science & Engineering (Artificial Intelligence and Machine Learning)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
HelpSteer is an Open-Source dataset designed to empower AI Alignment through the support of fair, team-oriented annotation. The dataset provides 37,120 samples each containing a prompt and response along with five human-annotated attributes ranging between 0 and 4; with higher results indicating better quality. Using cutting-edge methods in machine learning and natural language processing in combination with the annotation of data experts, HelpSteer strives to create a set of standardized values that can be used to measure alignment between human and machine interactions. With comprehensive datasets providing responses rated for correctness, coherence, complexity, helpfulness and verbosity, HelpSteer sets out to assist organizations in fostering reliable AI models which ensure more accurate results thereby leading towards improved user experience at all levels
For more datasets, click here.
- šØ Your notebook can be here! šØ!
How to Use HelpSteer: An Open-Source AI Alignment Dataset
HelpSteer is an open-source dataset designed to help researchers create models with AI Alignment. The dataset consists of 37,120 different samples each containing a prompt, a response and five human-annotated attributes used to measure these responses. This guide will give you a step-by-step introduction on how to leverage HelpSteer for your own projects.
Step 1 - Choosing the Data File
Helpsteer contains two data files ā one for training and one for validation. To start exploring the dataset, first select the file you would like to use by downloading both train.csv and validation.csv from the Kaggle page linked above or getting them from the Google Drive repository attached here: [link]. All the samples in each file consist of 7 columns with information about a single response: prompt (given), response (submitted), helpfulness, correctness, coherence, complexity and verbosity; all sporting values between 0 and 4 where higher means better in respective category.
## Step 2āExploratory Data Analysis (EDA) Once you have your file loaded into your workspace or favorite software environment (e.g suggested libraries like Pandas/Numpy or even Microsoft Excel), itās time explore it further by running some basic EDA commands that summarize each feature's distribution within our data set as well as note potential trends or points of interests throughout it - e.g what are some traits that are polarizing these responses more? Are there any outliers that might signal something interesting happening? Plotting these results often provides great insights into pattern recognition across datasets which can be used later on during modeling phase also known as āFeature Engineeringā
## Step 3āData Preprocessing After your interpretation of raw data while doing EDA should form some hypotheses around what features matter most when trying to estimate attribute scores of unknown responses accurately so proceeding with preprocessing such as cleaning up missing entries or handling outliers accordingly becomes highly recommended before starting any modelling efforts with this data set - kindly refer also back at Kaggle page description section if unsure about specific attributes domain ranges allowed values explicitly for extra confidence during this step because having correct numerical suggestions ready can make modelling workload lighter later on while building predictive models . Itās important not rushing over this stage otherwise poor results may occur later when aiming high accuracy too quickly upon model deployment due low quality
- Designating and measuring conversational AI engagement goals: Researchers can utilize the HelpSteer dataset to design evaluation metrics for AI engagement systems.
- Identifying conversational trends: By analyzing the annotations and data in HelpSteer, organizations can gain insights into what makes conversations more helpful, cohesive, complex or consistent across datasets or audiences.
- Training Virtual Assistants: Train artificial intelligence algorithms on this dataset to develop virtual assistants that respond effectively to customer queries with helpful answers
If you use this dataset in your research, please credit the original authors. Data Source
**License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommons.org/pu...
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the AI Data Versioning Platform market size reached USD 1.42 billion in 2024 globally, demonstrating robust expansion driven by the surging adoption of artificial intelligence and machine learning initiatives across industries. The market is exhibiting a strong compound annual growth rate (CAGR) of 22.8% from 2025 to 2033. By the end of 2033, the global AI Data Versioning Platform market is forecasted to attain a value of USD 11.84 billion. This remarkable growth is primarily fueled by the increasing complexity and scale of AI projects, necessitating advanced data management solutions that ensure data integrity, reproducibility, and collaborative workflows in enterprise environments.
The primary growth factor propelling the AI Data Versioning Platform market is the exponential increase in data generated by organizations leveraging artificial intelligence and machine learning. As enterprises deploy more sophisticated AI models, the need to track, manage, and reproduce datasets and model versions becomes critical. This has led to a surge in demand for platforms that can provide granular version control, ensuring that data scientists and engineers can collaborate efficiently without risking data inconsistencies or loss. Additionally, regulatory compliance requirements across sectors such as healthcare, BFSI, and manufacturing are pushing organizations to adopt robust data versioning practices, further bolstering market growth.
Another significant driver is the rising complexity of AI model development and deployment pipelines. Modern AI workflows often involve multiple teams working on various aspects of data preprocessing, feature engineering, model training, and validation. This complexity necessitates seamless collaboration and traceability, which AI Data Versioning Platforms offer by enabling users to track changes, roll back to previous versions, and maintain a comprehensive audit trail. The integration capabilities of these platforms with popular machine learning frameworks and DevOps tools have also made them indispensable in enterprise AI strategies, accelerating their adoption across industries.
The proliferation of cloud computing and the growing trend towards hybrid and multi-cloud environments have further augmented the adoption of AI Data Versioning Platforms. Cloud-based solutions offer scalability, flexibility, and cost-effectiveness, allowing organizations to manage vast volumes of data and model artifacts efficiently. Moreover, the increasing focus on data governance, security, and privacy in the wake of stringent data protection regulations worldwide has underscored the importance of data versioning as a foundational element of enterprise AI infrastructure. As organizations strive to derive actionable insights from their data assets while maintaining compliance, the AI Data Versioning Platform market is poised for sustained growth.
Regionally, North America continues to dominate the AI Data Versioning Platform market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The presence of leading technology companies, advanced research institutions, and a mature AI ecosystem in North America has fostered early adoption of data versioning solutions. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid digital transformation, increased investments in AI research, and the emergence of technology startups. Europe, with its strong regulatory framework and focus on data privacy, also represents a significant market, particularly in sectors such as healthcare and BFSI. Latin America and the Middle East & Africa are gradually catching up, supported by growing awareness and digitalization initiatives across industries.
The AI Data Versioning Platform market is segmented by component into software and services, each playing a crucial role in enabling organizations to manage their data assets effectively. Software solutions constitute the backbone of this market, offering comprehensive functionalities such as data tracking, version control, metadata management, and integration with popular machine learning frameworks. These platforms are designed to cater to the diverse needs of data scientists, engineers, and business analysts, providing intuitive interfaces and automation capabilities that streamline the data lifecycle.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A intentionally messy synthetic personal finance dataset designed for practicing real-world data preprocessing challenges before building AI-based expense forecasting models.
Created for BudgetWise - an AI expense forecasting tool. This dataset simulates real-world financial transaction data with all the messiness data scientists encounter in production: inconsistent formats, typos, duplicates, outliers, and missing values.
Perfect for practicing: - Data cleaning & normalization - Handling missing values - Date parsing & time-series analysis - Currency extraction & conversion - Outlier detection - Feature engineering - Class balancing (SMOTE) - Text standardization - Duplicate detection
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
hello! this dataset is complete_*preprocessing*_completed dataset and easily understand
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article examines the opportunities and benefits of artificial intelligence (AI)āenabled social media listening (SML) in assisting successful patient-focused drug development (PFDD). PFDD aims to incorporate the patient perspective to improve the quality, relevance, safety, and efficiency of drug development and evaluation. Gathering patient perspectives to support PFDD is aided by the participation of patient groups in communicating their treatment experiences, needs, preferences, and priorities through online platforms. SML is a method of gathering feedback directly from patients; however, distilling the quantity of data into actionable insights is challenging. AIāenabled methods, such as natural language processing (NLP), can facilitate data processing from SML studies. Herein, we describe a novel, trainable, AI-enabled, SML workflow that classifies posts made by patients or caregivers and uses NLP to provide data on their experiences. Our approach is an iterative process that balances human expertāled milestones and AI-enabled processes to support data preprocessing, patient and caregiver classification, and NLP methods to produce qualitative data. We explored the applicability of this workflow in 2 studies: 1 in patients with head and neck cancers and another in patients with esophageal cancer. Continuous refinement of AI-enabled algorithms was essential for collecting accurate and valuable results. This approach and workflow contribute to the establishment of well-defined standards of SML studies and advance the methodologic quality and rigor of researchers contributing to, conducting, and evaluating SML studies in a PFDD context.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Data Balance Optimization AI market size in 2024 stands at USD 2.18 billion, with a robust compound annual growth rate (CAGR) of 23.7% projected from 2025 to 2033. By the end of 2033, the market is forecasted to reach an impressive USD 17.3 billion. This substantial growth is driven by the surging demand for AI-powered analytics and increasing adoption of data-intensive applications across industries, establishing Data Balance Optimization AI as a critical enabler for enterprise digital transformation.
One of the primary growth factors fueling the Data Balance Optimization AI market is the exponential surge in data generation across various sectors. Organizations are increasingly leveraging digital technologies, IoT devices, and cloud platforms, resulting in vast, complex, and often imbalanced datasets. The need for advanced AI solutions that can optimize, balance, and manage these datasets has become paramount to ensure high-quality analytics, accurate machine learning models, and improved business decision-making. Enterprises recognize that imbalanced data can severely skew AI outcomes, leading to biases and reduced operational efficiency. Consequently, the demand for Data Balance Optimization AI tools is accelerating as businesses strive to extract actionable insights from diverse and voluminous data sources.
Another critical driver is the rapid evolution of AI and machine learning algorithms, which require balanced and high-integrity datasets for optimal performance. As industries such as healthcare, finance, and retail increasingly rely on predictive analytics and automation, the integrity of underlying data becomes a focal point. Data Balance Optimization AI technologies are being integrated into data pipelines to automatically detect and correct imbalances, ensuring that AI models are trained on representative and unbiased data. This not only enhances model accuracy but also helps organizations comply with stringent regulatory requirements related to data fairness and transparency, further reinforcing the marketās upward trajectory.
The proliferation of cloud computing and the shift toward hybrid IT infrastructures are also significant contributors to market growth. Cloud-based Data Balance Optimization AI solutions offer scalability, flexibility, and cost-effectiveness, making them attractive to both large enterprises and small and medium-sized businesses. These solutions facilitate seamless integration with existing data management systems, enabling real-time optimization and balancing of data across distributed environments. Furthermore, the rise of data-centric business models in sectors such as e-commerce, telecommunications, and manufacturing is amplifying the need for robust data optimization frameworks, propelling further adoption of Data Balance Optimization AI technologies globally.
From a regional perspective, North America currently dominates the Data Balance Optimization AI market, accounting for the largest share due to its advanced technological infrastructure, high investment in AI research, and the presence of leading technology firms. However, the Asia Pacific region is poised to experience the fastest growth during the forecast period, driven by rapid digitalization, expanding IT ecosystems, and increasing adoption of AI-powered solutions in emerging economies such as China, India, and Southeast Asia. Europe also presents significant opportunities, particularly in regulated industries such as finance and healthcare, where data integrity and compliance are paramount. Collectively, these regional trends underscore the global momentum behind Data Balance Optimization AI adoption.
The Data Balance Optimization AI market by component is segmented into software, hardware, and services, each playing a pivotal role in the overall ecosystem. The software segment commands the largest market share, driven by the continuous evolution of AI algorithms, data preprocessing tools, and machine learning frameworks designed to address data imbalance challenges. Organizations are increasingly investing in advanced software solutions that automate data balancing, cleansing, and augmentation processes, ensuring the reliability of AI-driven analytics. These software platforms often integrate seamlessly with existing data management systems, providing us
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
.
ā Contraction Expansion (e.g., "can't" ā "cannot") ā Lemmatization for Verbs, Adverbs, and Adjectives (except preserved words like "going") ā Apostrophe Space Fixes (e.g., "don 't" ā "don't") ā Preserving Important Words (e.g., "us", "they", "there") ā Plural Preservation (e.g., "beers" remains "beers") ā Handling Informal Language & Slang (e.g., "gonna" ā "going to") ā Light Grammar Correction
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction ⢠The Fruit Classification Dataset is a beginner classification dataset configured to classify fruit types based on fruit name, color, and weight information.
2) Data Utilization (1) Fruit Classification Dataset has characteristics that: ⢠This dataset consists of a total of three columns: categorical variable Color, continuous variable Weight, and target class Fruit, allowing you to pre-process categorical and numerical variables when learning classification models. (2) Fruit Classification Dataset can be used to: ⢠Model learning and evaluation: It can be used as educational and research experimental data to compare and evaluate the performance of various machine learning classification algorithms using color and weight characteristics. ⢠Data preprocessing practice: can be used as hands-on data to learn basic data preprocessing and feature engineering courses such as categorical variable encoding and continuous variable scaling.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Deep learning, a state-of-the-art machine learning approach, has shown outstanding performance over traditional machine learning in identifying intricate structures in complex high-dimensional data, especially in the domain of computer vision. The application of deep learning to early detection and automated classification of Alzheimer's disease (AD) has recently gained considerable attention, as rapid progress in neuroimaging techniques has generated large-scale multimodal neuroimaging data. A systematic review of publications using deep learning approaches and neuroimaging data for diagnostic classification of AD was performed. A PubMed and Google Scholar search was used to identify deep learning papers on AD published between January 2013 and July 2018. These papers were reviewed, evaluated, and classified by algorithm and neuroimaging type, and the findings were summarized. Of 16 studies meeting full inclusion criteria, 4 used a combination of deep learning and traditional machine learning approaches, and 12 used only deep learning approaches. The combination of traditional machine learning for classification and stacked auto-encoder (SAE) for feature selection produced accuracies of up to 98.8% for AD classification and 83.7% for prediction of conversion from mild cognitive impairment (MCI), a prodromal stage of AD, to AD. Deep learning approaches, such as convolutional neural network (CNN) or recurrent neural network (RNN), that use neuroimaging data without pre-processing for feature selection have yielded accuracies of up to 96.0% for AD classification and 84.2% for MCI conversion prediction. The best classification performance was obtained when multimodal neuroimaging and fluid biomarkers were combined. Deep learning approaches continue to improve in performance and appear to hold promise for diagnostic classification of AD using multimodal neuroimaging data. AD research that uses deep learning is still evolving, improving performance by incorporating additional hybrid data types, such asāomics data, increasing transparency with explainable approaches that add knowledge of specific disease-related features and mechanisms.
Facebook
TwitterNOTE: The manuscript associated with this data package is currently in review. The data may be revised based on reviewer feedback. Upon manuscript acceptance, this data package will be updated with the final dataset and additional metadata.This data package is associated with the manuscript āArtificial intelligence-guided iterations between observations and modeling significantly improve environmental predictionsā (Malhotra et al., in prep). This effort was designed following ICON (integrated, coordinated, open, and networked) principles to facilitate a model-experiment (ModEx) iteration approach, leveraging crowdsourced sampling across the contiguous United States (CONUS). New machine learning models were created every month to guide sampling locations. Data from the resulting samples were used to test and rebuild the machine learning models for the next round of sampling guidance. Associated sediment and water geochemistry and in situ sensor data can be found at https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1923689, https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1729719, and https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1603775. This data package is associated with two GitHub repositories found at https://github.com/parallelworks/dynamic-learning-rivers and https://github.com/WHONDRS-Hub/ICON-ModEx_Open_Manuscript. In addition to this readme, this data package also includes two file-level metadata (FLMD) files that describes each file and two data dictionaries (DD) that describe all column/row headers and variable definitions. This data package consists of two main folders (1) dynamic-learning-rivers and (2) ICON-ModEx_Open_Manuscript whichmore Ā» contain snapshots of the associated GitHub repositories. The input data, output data, and machine learning models used to guide sampling locations are within dynamic-learning-rivers. The folder is organized into five top-level directories: (1) āinput_dataā holds the training data for the ML models; (2) āml_modelsā holds machine learning (ML) models trained on the data in āinput_dataā; (3) āexamplesā contains files for direct experimentation with the machine learning model, including scripts for setting up āhindcastā run; (4) āscriptsā contains data preprocessing and postprocessing scripts and intermediate results specific to this data set that bookend the ML workflow; and (5) āoutput_dataā holds the overall results of the ML model on that branch. Each trained ML model resides on its own branch in the repository; this means that inputs and outputs can be different branch-to-branch. There is also one hidden directory ā.github/workflowsā. This hidden directory contains information for how to run the ML workflow as an end-to-end automated GitHub Action but it is not needed for reusing the ML models archived here. Please see the top-level README.md in the GitHub repository for more details on the automation.The scripts and data used to create figures in the manuscript are within ICON-ModEx_Open_Manuscript. The folder is organized into four folders which contain the scripts, data, and pdf for each figure. Within the āfig-model-score-evolutionā folder, there is a folder called āintermediate_branch_dataā which contains some intermediate files pulled from dynamic-learning-rivers and reorganized to easily integrate into the workflows. NOTE: THIS FOLDER INCLUDES THE FILES AT THE POINT OF PAPER SUBMISSION. IT WILL BE UPDATED ONCE THE PAPER IS ACCEPTED WITH ANY REVISIONS AND WILL INCLUDE A DD/FLMD AT THAT POINT.Ā« less
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset supports a study examining how students perceive the usefulness of artificial intelligence (AI) in educational settings. The project involved analyzing an open-access survey dataset that captures a wide range of student responses on AI tools in learning.
The data underwent cleaning and preprocessing, followed by an exploratory data analysis (EDA) to identify key trends and insights. Visualizations were created to support interpretation, and the results were summarized in a digital poster format to communicate findings effectively.
This resource may be useful for researchers, educators, and technologists interested in the evolving role of AI in education.
Keywords: Artificial Intelligence, Education, Student Perception, Survey, Data Analysis, EDA
Subject: Computer and Information Science
License: CC0 1.0 Universal Public Domain Dedication
DOI: https://doi.org/10.18738/T8/RXUCHK
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the baseline characteristics and supplementary data from a study of ICU patients with type 2 diabetes mellitus (T2DM), aiming to predict ventilator-associated pneumonia (VAP) using machine learning.The baseline characteristics table summarizes patient demographics, vital signs, and laboratory measurements. Supplementary figures illustrate the data preprocessing steps (histograms and boxplots before and after interquartile range cleaning), missing value imputation using the Random Forest method, variable correlation analysis (Spearman correlation heatmap), and model evaluation (confusion matrices of four predictive models). In addition, the dataset includes a file summarizing the TRIPOD-AI guideline used for model reporting. These data provide a detailed overview of feature selection, data cleaning procedures, and model performance assessment.Fig. S1. Histograms and boxplots of Glucose_max and SBP_max in original and cleaned datasets: Glusco_max, maximum blood glucose; SBP_max, maximum systolic blood pressure. (A) original Glusco_max; (B) cleaned Glusco_max; (C) original SBP_max; (D) cleaned SBP_max.Fig. S2. Histograms and boxplots of Temp_min and WBC_min in original and cleaned datasets: Temp_min, minimum body temperature; WBC_min, minimum white blood cell count.(A)original Temp_min; (B)cleaned Temp_min; (C)original WBC_min; (D)cleaned WBC_min.Fig. S3. Histograms of PH_max and PH_min in original and Random Forestāimputed datasets: PH_max, maximum pH; PH_min, minimum pH.Fig. S4. Histograms of PO2_max and PO2_min in original and Random Forestāimputed datasets: PO2_max, maximum partial pressure of oxygen; PO2_min, minimum partial pressure of oxygen.Fig. S5. Histograms of PT_max and PT_min in original and Random Forestāimputed datasets: PT_max, maximum prothrombin time; PT_min, minimum prothrombin time.Fig. S6. Spearman correlation heatmap of variables selected by both the Boruta algorithm and LASSO regression:Hypertension, history of hypertension; Temp_min, minimum body temperature; Glusco_max, maximum blood glucose; Scr_max, maximum serum creatinine; WBC_min, minimum white blood cell count;CNS, SOFA neurological subscore; Renal, SOFA renal subscore; and GCS, Glasgow Coma Scale.Fig. S7. Confusion matrices of four predictive models: (A) Logistic Regression, (B) Random Forest, (C) XGBoost, and (D) Gradient Boosting Machine (GBM). Each matrix presents the counts of true positives, true negatives, false positives, and false negatives, facilitating model performance comparison.
Facebook
Twitterhttps://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SalmonScan dataset is a collection of images of salmon fish, including healthy fish and infected fish. The dataset consists of two classes of images:
Fresh salmon š Infected Salmon š
This dataset is ideal for various computer vision tasks in machine learning and deep learning applications. Whether you are a researcher, developer, or student, the SalmonScan dataset offers a rich and diverse data source to support your projects and experiments.
So, dive in and explore the fascinating world of salmon health and disease!
The SalmonScan dataset (raw) consists of 24 fresh fish and 91 infected fish. [Due to server cleaning in the past, some raw datasets have been deleted]
The SalmonScan dataset (augmented) consists of approximately 1,208 images of salmon fish, classified into two classes:
Each class contains a representative and diverse collection of images, capturing a range of different perspectives, scales, and lighting conditions. The images have been carefully curated to ensure that they are of high quality and suitable for use in a variety of computer vision tasks.
Data Preprocessing
The input images were preprocessed to enhance their quality and suitability for further analysis. The following steps were taken:
Resizing š: All the images were resized to a uniform size of 600 pixels in width and 250 pixels in height to ensure compatibility with the learning algorithm. Image Augmentation šø: To overcome the small amount of images, various image augmentation techniques were applied to the input images. These included: Horizontal Flip ā©ļø: The images were horizontally flipped to create additional samples. Vertical Flip ā¬ļø: The images were vertically flipped to create additional samples. Rotation š: The images were rotated to create additional samples. Cropping šŖ: A portion of the image was randomly cropped to create additional samples. Gaussian Noise š: Gaussian noise was added to the images to create additional samples. Shearing š: The images were sheared to create additional samples. Contrast Adjustment (Gamma) āļø: The gamma correction was applied to the images to adjust their contrast. Contrast Adjustment (Sigmoid) āļø: The sigmoid function was applied to the images to adjust their contrast.
Usage
To use the salmon scan dataset in your ML and DL projects, follow these steps:
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
AI And Machine Learning Operationalization Software Market size was estimated at USD 6.12 Billion in 2024 and is projected to reach USD 36.25 Billion by 2032, growing at a CAGR of 35.2% from 2026 to 2032.
Key Market Drivers
Surging Adoption of AI & ML: The widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) across various industries is driven primarily by the surge in demand. With AI and ML increasingly leveraged by organizations for tasks like automation, decision-making, and process optimization, there is a growing demand for MLOps software to effectively manage and operationalize these models.
Facebook
Twitterhttps://data.gov.tw/licensehttps://data.gov.tw/license
This project aims to use artificial intelligence to identify potential risk factors for damaged asphalt pavements under the road, explore the pre-processing procedures and steps of ground penetrating radar data, and propose initial solutions or recommendations for difficulties and problems encountered in the pre-processing process.