100+ datasets found
  1. d

    Python Script for Cleaning Alum Dataset

    • search.dataone.org
    • hydroshare.org
    Updated Oct 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    saikumar payyavula; Jeff Sadler (2025). Python Script for Cleaning Alum Dataset [Dataset]. https://search.dataone.org/view/sha256%3A9df1a010044e2d50d741d5671b755351813450f4331dd7b0cc2f0a527750b30e
    Explore at:
    Dataset updated
    Oct 18, 2025
    Dataset provided by
    Hydroshare
    Authors
    saikumar payyavula; Jeff Sadler
    Description

    This resource contains a Python script used to clean and preprocess the alum dosage dataset from a small Oklahoma water treatment plant. The script handles missing values, removes outliers, merges historical water quality and weather data, and prepares the dataset for AI model training.

  2. Data Cleaning - Feature Imputation

    • kaggle.com
    zip
    Updated Aug 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mr.Machine (2022). Data Cleaning - Feature Imputation [Dataset]. https://www.kaggle.com/datasets/ilayaraja07/data-cleaning-feature-imputation
    Explore at:
    zip(116097 bytes)Available download formats
    Dataset updated
    Aug 13, 2022
    Authors
    Mr.Machine
    Description

    Data Cleaning or Data cleansing is to clean the data by imputing missing values, smoothing noisy data, and identifying or removing outliers. In general, the missing values are found due to collection error or data is corrupted.

    Here some info in details :Feature Engineering - Handling Missing Value

    Wine_Quality.csv dataset have the numerical missing data, and students_Performance.mv.csv dataset have Numerical and categorical missing data's.

  3. r

    Data from: Data Cleaning and AutoML: Would an Optimizer Choose to Clean?

    • resodate.org
    Updated Aug 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felix Neutatz; Binger Chen; Yazan Alkhatib; Jingwen Ye; Ziawasch Abedjan (2022). Data Cleaning and AutoML: Would an Optimizer Choose to Clean? [Dataset]. http://doi.org/10.14279/depositonce-15981
    Explore at:
    Dataset updated
    Aug 5, 2022
    Dataset provided by
    DepositOnce
    Technische Universität Berlin
    Authors
    Felix Neutatz; Binger Chen; Yazan Alkhatib; Jingwen Ye; Ziawasch Abedjan
    Description

    Data cleaning is widely acknowledged as an important yet tedious task when dealing with large amounts of data. Thus, there is always a cost-benefit trade-off to consider. In particular, it is important to assess this trade-off when not every data point and data error is equally important for a task. This is often the case when statistical analysis or machine learning (ML) models derive knowledge about data. If we only care about maximizing the utility score of the applications, such as accuracy or F1 scores, many tasks can afford some degree of data quality problems. Recent studies analyzed the impact of various data error types on vanilla ML tasks, showing that missing values and outliers significantly impact the outcome of such models. In this paper, we expand the setting to one where data cleaning is not considered in isolation but as an equal parameter among many other hyper-parameters that influence feature selection, regularization, and model selection. In particular, we use state-of-the-art AutoML frameworks to automatically learn the parameters that benefit a particular ML binary classification task. In our study, we see that specific cleaning routines still play a significant role but can also be entirely avoided if the choice of a specific model or the filtering of specific features diminishes the overall impact.

  4. Medical Clean Dataset

    • kaggle.com
    zip
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aamir Shahzad (2025). Medical Clean Dataset [Dataset]. https://www.kaggle.com/datasets/aamir5659/medical-clean-dataset
    Explore at:
    zip(1262 bytes)Available download formats
    Dataset updated
    Jul 6, 2025
    Authors
    Aamir Shahzad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is the cleaned version of a real-world medical dataset that was originally noisy, incomplete, and contained various inconsistencies. The dataset was cleaned through a structured and well-documented data preprocessing pipeline using Python and Pandas. Key steps in the cleaning process included:

    • Handling missing values using statistical techniques such as median imputation and mode replacement
    • Converting categorical values to consistent formats (e.g., gender formatting, yes/no standardization)
    • Removing duplicate entries to ensure data accuracy
    • Parsing and standardizing date fields
    • Creating new derived features such as age groups
    • Detecting and reviewing outliers based on IQR
    • Removing irrelevant or redundant columns

    The purpose of cleaning this dataset was to prepare it for further exploratory data analysis (EDA), data visualization, and machine learning modeling.

    This cleaned dataset is now ready for training predictive models, generating visual insights, or conducting healthcare-related research. It provides a high-quality foundation for anyone interested in medical analytics or data science practice.

  5. D

    Autonomous Data Cleaning With AI Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Autonomous Data Cleaning With AI Market Research Report 2033 [Dataset]. https://dataintelo.com/report/autonomous-data-cleaning-with-ai-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Autonomous Data Cleaning with AI Market Outlook




    According to our latest research, the global Autonomous Data Cleaning with AI market size in 2024 reached USD 1.82 billion, reflecting a robust expansion driven by rapid digital transformation across industries. The market is experiencing a CAGR of 25.7% from 2025 to 2033, with forecasts indicating that the market will reach USD 14.4 billion by 2033. This remarkable growth is primarily attributed to the increasing demand for high-quality, reliable data to power advanced analytics and artificial intelligence initiatives, as well as the escalating complexity and volume of data in modern enterprises.




    The surge in the adoption of artificial intelligence and machine learning technologies is a critical growth factor propelling the Autonomous Data Cleaning with AI market. Organizations are increasingly recognizing the importance of clean, accurate data as a foundational asset for digital transformation, predictive analytics, and data-driven decision-making. As data volumes continue to explode, manual data cleaning processes have become unsustainable, leading enterprises to seek autonomous solutions powered by AI algorithms. These solutions not only automate error detection and correction but also enhance data consistency, integrity, and usability across disparate systems, reducing operational costs and improving business agility.




    Another significant driver for the Autonomous Data Cleaning with AI market is the rising regulatory pressure around data governance and compliance. Industries such as banking, finance, and healthcare are subject to stringent data quality requirements, necessitating robust mechanisms to ensure data accuracy and traceability. AI-powered autonomous data cleaning tools are increasingly being integrated into enterprise data management strategies to address these regulatory challenges. These tools help organizations maintain compliance, minimize the risk of data breaches, and avoid costly penalties, further fueling market growth as regulatory frameworks become more complex and widespread across global markets.




    The proliferation of cloud computing and the shift towards hybrid and multi-cloud environments are also accelerating the adoption of Autonomous Data Cleaning with AI solutions. As organizations migrate workloads and data assets to the cloud, ensuring data quality across distributed environments becomes paramount. Cloud-based autonomous data cleaning platforms offer scalability, flexibility, and integration capabilities that are well-suited to dynamic enterprise needs. The growing ecosystem of cloud-native AI tools, combined with the increasing sophistication of data integration and orchestration platforms, is enabling businesses to deploy autonomous data cleaning at scale, driving substantial market expansion.




    From a regional perspective, North America continues to dominate the Autonomous Data Cleaning with AI market, accounting for the largest revenue share in 2024. The region’s advanced technological infrastructure, high concentration of AI innovators, and early adoption by large enterprises are key factors supporting its leadership position. However, Asia Pacific is emerging as the fastest-growing regional market, fueled by rapid digitalization, expanding IT investments, and strong government initiatives supporting AI and data-driven innovation. Europe also remains a significant contributor, with increasing adoption in sectors such as banking, healthcare, and manufacturing. Overall, the global market exhibits a broadening geographic footprint, with opportunities emerging across both developed and developing economies.



    Component Analysis




    The Autonomous Data Cleaning with AI market is segmented by component into Software and Services. The software segment currently holds the largest share of the market, driven by the rapid advancement and deployment of AI-powered data cleaning platforms. These software solutions leverage sophisticated algorithms for anomaly detection, deduplication, data enrichment, and validation, providing organizations with automated tools to ensure data quality at scale. The increasing integration of machine learning and natural language processing (NLP) capabilities further enhances the effectiveness of these platforms, enabling them to address a wide range of data quality issues across structured and unstructured datasets.




    The

  6. d

    Prediction data from: Machine learning predicts which rivers, streams, and...

    • datadryad.org
    • dataone.org
    • +1more
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro (2023). Prediction data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates [Dataset]. http://doi.org/10.5061/dryad.z34tmpgm7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Dryad
    Authors
    Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro
    Time period covered
    Sep 27, 2023
    Description

    This dataset contains model outputs that were analyzed to produce the main results of the paper.

  7. G

    Autonomous Data Cleaning with AI Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Autonomous Data Cleaning with AI Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/autonomous-data-cleaning-with-ai-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Oct 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Autonomous Data Cleaning with AI Market Outlook



    According to our latest research, the global Autonomous Data Cleaning with AI market size reached USD 1.68 billion in 2024, with a robust year-on-year growth driven by the surge in enterprise data volumes and the mounting demand for high-quality, actionable insights. The market is projected to expand at a CAGR of 24.2% from 2025 to 2033, which will take the overall market value to approximately USD 13.1 billion by 2033. This rapid growth is fueled by the increasing adoption of artificial intelligence (AI) and machine learning (ML) technologies across industries, aiming to automate and optimize the data cleaning process for improved operational efficiency and decision-making.




    The primary growth driver for the Autonomous Data Cleaning with AI market is the exponential increase in data generation across various industries such as BFSI, healthcare, retail, and manufacturing. Organizations are grappling with massive amounts of structured and unstructured data, much of which is riddled with inconsistencies, duplicates, and inaccuracies. Manual data cleaning is both time-consuming and error-prone, leading businesses to seek automated AI-driven solutions that can intelligently detect, correct, and prevent data quality issues. The integration of AI not only accelerates the data cleaning process but also ensures higher accuracy, enabling organizations to leverage clean, reliable data for analytics, compliance, and digital transformation initiatives. This, in turn, translates into enhanced business agility and competitive advantage.




    Another significant factor propelling the market is the increasing regulatory scrutiny and compliance requirements in sectors such as banking, healthcare, and government. Regulations such as GDPR, HIPAA, and others mandate strict data governance and quality standards. Autonomous Data Cleaning with AI solutions help organizations maintain compliance by ensuring data integrity, traceability, and auditability. Additionally, the evolution of cloud computing and the proliferation of big data analytics platforms have made it easier for organizations of all sizes to deploy and scale AI-powered data cleaning tools. These advancements are making autonomous data cleaning more accessible, cost-effective, and scalable, further driving market adoption.




    The growing emphasis on digital transformation and real-time decision-making is also a crucial growth factor for the Autonomous Data Cleaning with AI market. As enterprises increasingly rely on analytics, machine learning, and artificial intelligence for business insights, the quality of input data becomes paramount. Automated, AI-driven data cleaning solutions enable organizations to process, cleanse, and prepare data in real-time, ensuring that downstream analytics and AI models are fed with high-quality inputs. This not only improves the accuracy of business predictions but also reduces the time-to-insight, helping organizations stay ahead in highly competitive markets.




    From a regional perspective, North America currently dominates the Autonomous Data Cleaning with AI market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The presence of leading technology companies, early adopters of AI, and a mature regulatory environment are key factors contributing to North America’s leadership. However, Asia Pacific is expected to witness the highest CAGR over the forecast period, driven by rapid digitalization, expanding IT infrastructure, and increasing investments in AI and data analytics, particularly in countries such as China, India, and Japan. Latin America and the Middle East & Africa are also gradually emerging as promising markets, supported by growing awareness and adoption of AI-driven data management solutions.





    Component Analysis



    The Autonomous Data Cleaning with AI market is segmented by component into Software and Services. The software segment currently holds the largest market share, driven

  8. e

    Data pre-processing and clean-up

    • paper.erudition.co.in
    html
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Einetic (2025). Data pre-processing and clean-up [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering-artificial-intelligence-and-machine-learning/6/data-mining
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 3, 2025
    Dataset authored and provided by
    Einetic
    License

    https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

    Description

    Question Paper Solutions of chapter Data pre-processing and clean-up of Data Mining, 6th Semester , B.Tech in Computer Science & Engineering (Artificial Intelligence and Machine Learning)

  9. Credit Card Approvals (Clean Data)

    • kaggle.com
    zip
    Updated Apr 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Cortinhas (2022). Credit Card Approvals (Clean Data) [Dataset]. https://www.kaggle.com/datasets/samuelcortinhas/credit-card-approval-clean-data
    Explore at:
    zip(19448 bytes)Available download formats
    Dataset updated
    Apr 25, 2022
    Authors
    Samuel Cortinhas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a cleaned version of this dataset from UCI machine learning repository on credit card approvals.

    Missing values have been filled and feature names and categorical names have been inferred, resulting in more context and it being easier to use.

    Your task is to predict which people in the dataset are successful in applying for a credit card.

  10. Credit Score Classification Cleaned Dataset

    • kaggle.com
    zip
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    irem nur tokuroglu (2024). Credit Score Classification Cleaned Dataset [Dataset]. https://www.kaggle.com/datasets/iremnurtokuroglu/credit-score-classification-cleaned-dataset
    Explore at:
    zip(4159334 bytes)Available download formats
    Dataset updated
    Nov 26, 2024
    Authors
    irem nur tokuroglu
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This notebook is a cleaned version of the Credit Score classification dataset, with comprehensive EDA (Exploratory Data Analysis) already performed. You can apply preprocessing steps to any columns as needed and use it as a training dataset for machine learning and deep learning models

    Original dataset is available on: https://www.kaggle.com/datasets/parisrohan/credit-score-classification

  11. Data from: ManyTypes4Py: A Benchmark Python Dataset for Machine...

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-Based Type Inference [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-5244636?locale=lv
    Explore at:
    unknown(1052407809)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0.7): The clean version has 5.1K type-checked Python repositories and 1.2M type annotations. The complete version has 5.2K Python repositories and 3.3M type annotations. The dataset's source files are type-checked using mypy (clean version). The dataset is also de-duplicated using the CD4Py tool. Check out the README.MD file for the description of the dataset. Notable changes to each version of the dataset are documented in CHANGELOG.md. The dataset's scripts and utilities are available on its GitHub repository.

  12. d

    Training data from: Machine learning predicts which rivers, streams, and...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jun 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro (2024). Training data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates [Dataset]. http://doi.org/10.5061/dryad.m63xsj47s
    Explore at:
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro
    Time period covered
    Jan 1, 2023
    Description

    We assess which waters the Clean Water Act protects and how Supreme Court and White House rules change this regulation. We train a deep learning model using aerial imagery and geophysical data to predict 150,000 jurisdictional determinations from the Army Corps of Engineers, each deciding regulation for one water resource. Under a 2006 Supreme Court ruling, the Clean Water Act protects two-thirds of US streams and over half of wetlands; under a 2020 White House rule, it protects under half of streams and a fourth of wetlands, implying deregulation of 690,000 stream miles, 35 million wetland acres, and 30% of waters around drinking water sources. Our framework can support permitting, policy design, and use of machine learning in regulatory implementation problems. , This dataset contains data used to train the models., , # Training data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates

    This dataset contains data used to train the models in Greenhill et al. (2023). All data are publicly available and can be accessed either through Google Earth Engine or directly from the data providers, as described in Table S3 of the Supplementary Material. In addition, we are providing access to the full set of pre-processed inputs for model training via this repository. We are also providing access to a subset of the data used for prediction, as well as all data needed for reproducing the results of the paper, in another Dryad repository: . All code written for the project is available at .

    Description of the data and file structure

    The files here include:

    • Trained models, saved in PyTorch Checkpoint format: wotus_model.pth.tar, resource_type_model.pth.tar, cowardin_code_model.pth.tar, ajd_model.pth.tar.
    • ...
  13. D

    Data Clean Room For AI Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data Clean Room For AI Market Research Report 2033 [Dataset]. https://dataintelo.com/report/data-clean-room-for-ai-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Clean Room for AI Market Outlook



    According to our latest research, the global Data Clean Room for AI market size is valued at USD 1.42 billion in 2024, with a robust compound annual growth rate (CAGR) of 27.6% projected from 2025 to 2033. By the end of 2033, the market is forecasted to reach USD 13.34 billion. The primary growth factor driving this market is the surge in privacy-centric data collaboration solutions, fueled by regulatory tightening and the need for secure, compliant AI-driven analytics across industries.




    The exponential expansion of digital data, coupled with increasing consumer privacy concerns and regulatory frameworks such as GDPR and CCPA, are propelling the adoption of Data Clean Rooms for AI. Organizations are seeking innovative ways to harness data-driven insights without compromising user privacy. Data Clean Rooms address this imperative by enabling secure, privacy-preserving collaboration between multiple parties, facilitating advanced analytics and machine learning without exposing raw data. As a result, enterprises across sectors, including advertising, healthcare, and finance, are rapidly integrating Data Clean Room solutions to unlock the full potential of AI while ensuring compliance and trust.




    Another significant growth driver for the Data Clean Room for AI market is the escalating complexity and volume of data generated by digital transformation initiatives. As businesses transition to omnichannel customer engagement and leverage AI for hyper-personalization, the need for secure environments to aggregate and analyze sensitive datasets becomes paramount. Data Clean Rooms enable seamless data interoperability between partners, advertisers, and platforms, which is especially critical for industries like retail and e-commerce where first-party data collaboration can yield substantial competitive advantages. This secure data sharing model not only mitigates risks associated with data breaches but also enhances the accuracy and effectiveness of AI models, further fueling market expansion.




    The proliferation of cloud-based solutions and the integration of advanced AI and machine learning capabilities within Data Clean Rooms are catalyzing market growth. Cloud deployment offers scalability, flexibility, and cost-efficiency, making Data Clean Rooms accessible to organizations of all sizes. Moreover, the rapid evolution of AI algorithms and privacy-enhancing technologies, such as federated learning and differential privacy, are enhancing the utility and adoption of Data Clean Rooms. This technological convergence is enabling organizations to extract actionable insights from siloed data sources while maintaining stringent privacy controls, thus accelerating the market's upward trajectory.




    Regionally, North America dominates the Data Clean Room for AI market, accounting for the largest revenue share in 2024. This leadership is attributed to the early adoption of privacy-centric technologies, a mature digital ecosystem, and the presence of major technology providers. Europe follows closely, driven by strict data protection regulations and a strong emphasis on ethical AI practices. The Asia Pacific region is expected to exhibit the fastest growth during the forecast period, propelled by rapid digitalization, burgeoning e-commerce, and increasing investments in AI infrastructure. Latin America and the Middle East & Africa are also witnessing steady adoption, particularly in sectors like banking, retail, and healthcare, as organizations seek secure data collaboration frameworks to drive innovation and operational efficiency.



    Component Analysis



    The Data Clean Room for AI market is segmented by component into Software and Services, each playing a pivotal role in shaping the industry landscape. Software solutions form the backbone of Data Clean Room offerings, providing secure environments for data collaboration, privacy-preserving analytics, and AI model training. These platforms are increasingly leveraging sophisticated cryptographic techniques, access controls, and machine learning algorithms to ensure data security and compliance. The growing demand for customizable, scalable, and interoperable software solutions is driving continuous innovation, with vendors focusing on enhancing user experience, integration capabilities, and automation features to cater to diverse industry requirements.



    <p&

  14. f

    Data from: Leveraging Supervised Machine Learning Algorithms for System...

    • acs.figshare.com
    zip
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Russell R. Kibbe; Alexandria L. Sohn; David C. Muddiman (2024). Leveraging Supervised Machine Learning Algorithms for System Suitability Testing of Mass Spectrometry Imaging Platforms [Dataset]. http://doi.org/10.1021/acs.jproteome.4c00360.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 3, 2024
    Dataset provided by
    ACS Publications
    Authors
    Russell R. Kibbe; Alexandria L. Sohn; David C. Muddiman
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Quality control and system suitability testing are vital protocols implemented to ensure the repeatability and reproducibility of data in mass spectrometry investigations. However, mass spectrometry imaging (MSI) analyses present added complexity since both chemical and spatial information are measured. Herein, we employ various machine learning algorithms and a novel quality control mixture to classify the working conditions of an MSI platform. Each algorithm was evaluated in terms of its performance on unseen data, validated with negative control data sets to rule out confounding variables or chance agreement, and utilized to determine the necessary sample size to achieve a high level of accurate classifications. In this work, a robust machine learning workflow was established where models could accurately classify the instrument condition as clean or compromised based on data metrics extracted from the analyzed quality control sample. This work highlights the power of machine learning to recognize complex patterns in MSI data and use those relationships to perform a system suitability test for MSI platforms.

  15. (Cleaned) Credit Score Dataset for Classification

    • kaggle.com
    zip
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    muhammed (2023). (Cleaned) Credit Score Dataset for Classification [Dataset]. https://www.kaggle.com/clkmuhammed/creditscoreclassification
    Explore at:
    zip(8917124 bytes)Available download formats
    Dataset updated
    Dec 8, 2023
    Authors
    muhammed
    Description

    Original Dataset: https://www.kaggle.com/datasets/parisrohan/credit-score-classification

    Data Cleaning: https://www.kaggle.com/code/clkmuhammed/credit-score-classification-data-cleaning-project

    data cleaned and made ready for machine learning

    Task Given a person’s credit-related information, build a machine learning model that can classify the credit score.

  16. i

    A Dataset with Adversarial Attacks on Deep Learning in Wireless Modulation...

    • ieee-dataport.org
    Updated Sep 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonios Argyriou (2023). A Dataset with Adversarial Attacks on Deep Learning in Wireless Modulation Classification [Dataset]. https://ieee-dataport.org/documents/dataset-adversarial-attacks-deep-learning-wireless-modulation-classification
    Explore at:
    Dataset updated
    Sep 23, 2023
    Authors
    Antonios Argyriou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains adversarial attacks on Deep Learning (DL) when it is employed for the classification of wireless modulated communication signals. The attack is executed with an obfuscating waveform that is embedded in the transmitted signal in such a way that prevents the extraction of clean data for training from a wireless eavesdropper. At the same time it allows a legitimate receiver (LRx) to demodulate the data.

  17. o

    Data from: ManyTypes4Py: A benchmark Python Dataset for Machine...

    • explore.openaire.eu
    • data.europa.eu
    Updated Apr 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir M. Mir; Evaldas Latoskinas; Georgios Gousios (2021). ManyTypes4Py: A benchmark Python Dataset for Machine Learning-Based Type Inference [Dataset]. http://doi.org/10.5281/zenodo.4044635
    Explore at:
    Dataset updated
    Apr 26, 2021
    Authors
    Amir M. Mir; Evaldas Latoskinas; Georgios Gousios
    Description

    The dataset is gathered on Sep. 17th 2020 from GitHub. It has more than 5.2K Python repositories and 4.2M type annotations. The dataset is also de-duplicated using the CD4Py tool. Check out the README.MD file for the description of the dataset. Notable changes to each version of the dataset are documented in CHANGELOG.md. The dataset's scripts and utilities are available on its GitHub repository.

  18. Data Wrangling Market Size, Share, Growth, Forecast, By Component...

    • verifiedmarketresearch.com
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2025). Data Wrangling Market Size, Share, Growth, Forecast, By Component (Solutions, Services), By Deployment Mode (On-premises, Cloud-based), By End-user Industry (Banking, Financial Services, and Insurance (BFSI), Healthcare & Life Sciences, Retail & E-commerce, IT & Telecom, Government & Public Sector, Manufacturing) [Dataset]. https://www.verifiedmarketresearch.com/product/data-wrangling-market/
    Explore at:
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2026 - 2032
    Area covered
    Global
    Description

    Data Wrangling Market size was valued at USD 1.99 Billion in 2024 and is projected to reach USD 4.07 Billion by 2032, growing at a CAGR of 9.4% during the forecast period 2026-2032.• Big Data Analytics Growth: Organizations are generating massive volumes of unstructured and semi-structured data from diverse sources including social media, IoT devices, and digital transactions. Data wrangling tools become essential for cleaning, transforming, and preparing this complex data for meaningful analytics and business intelligence applications.• Machine Learning and AI Adoption: The rapid expansion of artificial intelligence and machine learning initiatives requires high-quality, properly formatted training datasets. Data wrangling solutions enable data scientists to efficiently prepare, clean, and structure raw data for model training, driving sustained market demand across AI-focused organizations.

  19. g

    Salt and Pepper Noise Dataset: Clean vs Noisy Image

    • gts.ai
    json
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2025). Salt and Pepper Noise Dataset: Clean vs Noisy Image [Dataset]. https://gts.ai/dataset-download/salt-and-pepper-noise-datasetclean-vs-noisy-image/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Globose Technology Solutions Private Limited
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore the Salt and Pepper Noise Dataset with clean and noisy images for image processing and computer vision research.

  20. d

    FileMarket | Dataset for Face Anti-Spoofing (Videos) in Computer Vision...

    • datarade.ai
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2024). FileMarket | Dataset for Face Anti-Spoofing (Videos) in Computer Vision Applications | Machine Learning (ML) Data | Deep Learning (DL) Data [Dataset]. https://datarade.ai/data-products/filemarket-dataset-for-face-anti-spoofing-videos-in-compu-filemarket
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jul 10, 2024
    Dataset authored and provided by
    FileMarket
    Area covered
    Mali, Malawi, Belarus, Zimbabwe, Sierra Leone, South Africa, Central African Republic, Ukraine, Chad, Congo (Democratic Republic of the)
    Description

    Live Face Anti-Spoof Dataset

    A live face dataset is crucial for advancing computer vision tasks such as face detection, anti-spoofing detection, and face recognition. The Live Face Anti-Spoof Dataset offered by Ainnotate is specifically designed to train algorithms for anti-spoofing purposes, ensuring that AI systems can accurately differentiate between real and fake faces in various scenarios.

    Key Features:

    Comprehensive Video Collection: The dataset features thousands of videos showcasing a diverse range of individuals, including males and females, with and without glasses. It also includes men with beards, mustaches, and clean-shaven faces. Lighting Conditions: Videos are captured in both indoor and outdoor environments, ensuring that the data covers a wide range of lighting conditions, making it highly applicable for real-world use. Data Collection Method: Our datasets are gathered through a community-driven approach, leveraging our extensive network of over 700k users across various Telegram apps. This method ensures that the data is not only diverse but also ethically sourced with full consent from participants, providing reliable and real-world applicable data for training AI models. Versatility: This dataset is ideal for training models in face detection, anti-spoofing, and face recognition tasks, offering robust support for these essential computer vision applications. In addition to the Live Face Anti-Spoof Dataset, FileMarket provides specialized datasets across various categories to support a wide range of AI and machine learning projects:

    Object Detection Data: Perfect for training AI in image and video analysis. Machine Learning (ML) Data: Offers a broad spectrum of applications, from predictive analytics to natural language processing (NLP). Large Language Model (LLM) Data: Designed to support text generation, chatbots, and machine translation models. Deep Learning (DL) Data: Essential for developing complex neural networks and deep learning models. Biometric Data: Includes diverse datasets for facial recognition, fingerprint analysis, and other biometric applications. This live face dataset, alongside our other specialized data categories, empowers your AI projects by providing high-quality, diverse, and comprehensive datasets. Whether your focus is on anti-spoofing detection, face recognition, or other biometric and machine learning tasks, our data offerings are tailored to meet your specific needs.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
saikumar payyavula; Jeff Sadler (2025). Python Script for Cleaning Alum Dataset [Dataset]. https://search.dataone.org/view/sha256%3A9df1a010044e2d50d741d5671b755351813450f4331dd7b0cc2f0a527750b30e

Python Script for Cleaning Alum Dataset

Explore at:
Dataset updated
Oct 18, 2025
Dataset provided by
Hydroshare
Authors
saikumar payyavula; Jeff Sadler
Description

This resource contains a Python script used to clean and preprocess the alum dosage dataset from a small Oklahoma water treatment plant. The script handles missing values, removes outliers, merges historical water quality and weather data, and prepares the dataset for AI model training.

Search
Clear search
Close search
Google apps
Main menu