100+ datasets found
  1. o

    Identifying Missing Data Handling Methods with Text Mining

    • openicpsr.org
    delimited
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Mar 8, 2023
    Dataset provided by
    Hungarian Academy of Sciences
    Authors
    Krisztián Boros; Zoltán Kmetty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1999 - Dec 31, 2016
    Description

    Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.

  2. Handling of missing values in python

    • kaggle.com
    zip
    Updated Jul 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xodeum (2022). Handling of missing values in python [Dataset]. https://www.kaggle.com/datasets/xodeum/handling-of-missing-values-in-python
    Explore at:
    zip(2634 bytes)Available download formats
    Dataset updated
    Jul 3, 2022
    Authors
    xodeum
    Description

    In this Datasets i simply showed the handling of missing values in your data with help of python libraries such as NumPy and pandas. You can also see the use of Nan and Non values. Detecting, dropping and filling of null values.

  3. Results of the ML models were obtained by deleting missing values from the...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Results of the ML models were obtained by deleting missing values from the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of the ML models were obtained by deleting missing values from the dataset.

  4. S

    Deep learning based Missing Data Imputation

    • scidb.cn
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahjabeen Tahir (2024). Deep learning based Missing Data Imputation [Dataset]. http://doi.org/10.57760/sciencedb.16599
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Mahjabeen Tahir
    Description

    The code provided is related to training an autoencoder, evaluating its performance, and using it for imputing missing values in a dataset. Let's break down each part:Training the Autoencoder (train_autoencoder function):This function takes an autoencoder model and the input features as input.It trains the autoencoder using the input features as both input and target output (hence features, features).The autoencoder is trained for a specified number of epochs (epochs) with a given batch size (batch_size).The shuffle=True argument ensures that the data is shuffled before each epoch to prevent the model from memorizing the input order.After training, it returns the trained autoencoder model and the training history.Evaluating the Autoencoder (evaluate_autoencoder function):This function takes a trained autoencoder model and the input features as input.It uses the trained autoencoder to predict the reconstructed features from the input features.It calculates Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) scores between the original and reconstructed features.These metrics provide insights into how well the autoencoder is able to reconstruct the input features.Imputing with the Autoencoder (impute_with_autoencoder function):This function takes a trained autoencoder model and the input features as input.It identifies missing values (e.g., -9999) in the input features.For each row with missing values, it predicts the missing values using the trained autoencoder.It replaces the missing values with the predicted values.The imputed features are returned as output.To reuse this code:Load your dataset and preprocess it as necessary.Build an autoencoder model using the build_autoencoder function.Train the autoencoder using the train_autoencoder function with your input features.Evaluate the performance of the autoencoder using the evaluate_autoencoder function.If your dataset contains missing values, use the impute_with_autoencoder function to impute them with the trained autoencoder.Use the trained autoencoder for any other relevant tasks, such as feature extraction or anomaly detection.

  5. f

    MacroPCA: An All-in-One PCA Method Allowing for Missing Values as Well as...

    • tandf.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mia Hubert; Peter J. Rousseeuw; Wannes Van den Bossche (2023). MacroPCA: An All-in-One PCA Method Allowing for Missing Values as Well as Cellwise and Rowwise Outliers [Dataset]. http://doi.org/10.6084/m9.figshare.7624424.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Mia Hubert; Peter J. Rousseeuw; Wannes Van den Bossche
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multivariate data are typically represented by a rectangular matrix (table) in which the rows are the objects (cases) and the columns are the variables (measurements). When there are many variables one often reduces the dimension by principal component analysis (PCA), which in its basic form is not robust to outliers. Much research has focused on handling rowwise outliers, that is, rows that deviate from the majority of the rows in the data (e.g., they might belong to a different population). In recent years also cellwise outliers are receiving attention. These are suspicious cells (entries) that can occur anywhere in the table. Even a relatively small proportion of outlying cells can contaminate over half the rows, which causes rowwise robust methods to break down. In this article, a new PCA method is constructed which combines the strengths of two existing robust methods to be robust against both cellwise and rowwise outliers. At the same time, the algorithm can cope with missing values. As of yet it is the only PCA method that can deal with all three problems simultaneously. Its name MacroPCA stands for PCA allowing for Missingness And Cellwise & Rowwise Outliers. Several simulations and real datasets illustrate its robustness. New residual maps are introduced, which help to determine which variables are responsible for the outlying behavior. The method is well-suited for online process control.

  6. Machine learning models.

    • plos.figshare.com
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Machine learning models. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

  7. f

    Description of the dataset used in this study.

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Description of the dataset used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

  8. Finding_And_Visualizing_Missing_Data_Python

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Finding_And_Visualizing_Missing_Data_Python [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/finding-and-visualizing-missing-data-python
    Explore at:
    zip(371581 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    • This dataset is designed for learning how to identify missing data in Python.
    • It focuses on techniques to detect null, NaN, and incomplete values.
    • It includes examples of visualizing missing data patterns using Python libraries.
    • Useful for beginners practicing data preprocessing and data cleaning.
    • Helps users understand missing data handling methods for machine learning workflows.
    • Supports practical exploration of datasets before model training.

  9. Fake news detection dataset with labels

    • kaggle.com
    zip
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harshit Singh (2023). Fake news detection dataset with labels [Dataset]. https://www.kaggle.com/datasets/harshitsingh1710/fake-news-detection-dataset-with-labels/code
    Explore at:
    zip(45514031 bytes)Available download formats
    Dataset updated
    Apr 19, 2023
    Authors
    Harshit Singh
    Description

    Dataset

    This dataset was created by Harshit Singh

    Contents

  10. f

    Handling of missing values, stratified by whether prediction was the primary...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 22, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bouwmeester, Walter; Altman, Douglas G.; Mallett, Susan; Vergouwe, Yvonne; Moons, Karel G. M.; Zuithoff, Nicolaas P. A.; Geerlings, Mirjam I.; Steyerberg, Ewout W. (2012). Handling of missing values, stratified by whether prediction was the primary or secondary study aim. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001143513
    Explore at:
    Dataset updated
    May 22, 2012
    Authors
    Bouwmeester, Walter; Altman, Douglas G.; Mallett, Susan; Vergouwe, Yvonne; Moons, Karel G. M.; Zuithoff, Nicolaas P. A.; Geerlings, Mirjam I.; Steyerberg, Ewout W.
    Description

    Numbers are column percentages, with absolute numbers in parentheses.aSome studies reported more than one item. Hence, percentages do not add up to 100%.bCross-sectional studies were excluded for this item (item not applicable).cMore than one method could be applied. Hence, the percentages do not add up to 100%. Items were not applicable for two primary-aim studies that had no missing values. Hence, total n = 69.dOnly participants with completely observed data were analysed.eFor example: in a diagnostic study [73], the investigators assumed that among participants who did not undergo follow-up colonoscopy, the detection rates for any adenoma and for an advanced adenoma ranged from half to twice the rates among participants who did undergo follow-up colonoscopy.

  11. Results of the ML models using PCA imputer.

    • plos.figshare.com
    xls
    Updated Jan 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Results of the ML models using PCA imputer. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

  12. Cleaning Practice with Errors & Missing Values

    • kaggle.com
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zuhair khan (2025). Cleaning Practice with Errors & Missing Values [Dataset]. https://www.kaggle.com/datasets/zuhairkhan13/cleaning-practice-with-errors-and-missing-values
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Zuhair khan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is designed specifically for beginners and intermediate learners to practice data cleaning techniques using Python and Pandas.

    It includes 500 rows of simulated employee data with intentional errors such as:

    Missing values in Age and Salary

    Typos in email addresses (@gamil.com)

    Inconsistent city name casing (e.g., lahore, Karachi)

    Extra spaces in department names (e.g., " HR ")

    ✅ Skills You Can Practice:

    Detecting and handling missing data

    String cleaning and formatting

    Removing duplicates

    Validating email formats

    Standardizing categorical data

    You can use this dataset to build your own data cleaning notebook, or use it in interviews, assessments, and tutorials.

  13. Sales Dataset v2 for Marketing Analytics

    • kaggle.com
    zip
    Updated Jun 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanuel DJEGOU (2022). Sales Dataset v2 for Marketing Analytics [Dataset]. https://www.kaggle.com/datasets/emmanueldjegou/sales-dataset-enlarged
    Explore at:
    zip(1114 bytes)Available download formats
    Dataset updated
    Jun 26, 2022
    Authors
    Emmanuel DJEGOU
    Description

    Looking painstakingly at the dataset, it's noticeable that some inconsistencies are messing up our data. In fact, the columns Product and line should count for a sigle attribut. Then, the actual observation should be Camping Equipment. Similarily, columns such as Retailer and country, are undergoing the same issue. In addition, the values of the rows regarding the attributs order and method do not convey any relevant information. Consequently, some supplemental work need to be done in the analysis.

  14. f

    Full features list obtained including training features, identification,...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Sep 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anjara, Sabrina G.; Collazo-Lorduy, Ana; Janik, Adrianna; Torrente, Maria; Provencio, Mariano; Costabello, Luca; Mc Kenzie, Kenneth; Dunford-Stenger, Amy (2023). Full features list obtained including training features, identification, label, and features with missing values filtered out before training. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000968542
    Explore at:
    Dataset updated
    Sep 14, 2023
    Authors
    Anjara, Sabrina G.; Collazo-Lorduy, Ana; Janik, Adrianna; Torrente, Maria; Provencio, Mariano; Costabello, Luca; Mc Kenzie, Kenneth; Dunford-Stenger, Amy
    Description

    Full features list obtained including training features, identification, label, and features with missing values filtered out before training.

  15. Dataset for handling missing values in well-log curves with a gated graph...

    • figshare.com
    zip
    Updated Aug 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chunbi Jiang (2021). Dataset for handling missing values in well-log curves with a gated graph neural network [Dataset]. http://doi.org/10.6084/m9.figshare.15141021.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 28, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Chunbi Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset used in the article titled "Handling missing values in well-log curves with a gated graph neural network".

  16. Facebook Campaign Dataset for Marketing Analytics

    • kaggle.com
    zip
    Updated Jun 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanuel DJEGOU (2022). Facebook Campaign Dataset for Marketing Analytics [Dataset]. https://www.kaggle.com/datasets/emmanueldjegou/campaign-dataset
    Explore at:
    zip(664 bytes)Available download formats
    Dataset updated
    Jun 26, 2022
    Authors
    Emmanuel DJEGOU
    Description

    The concern with this Dataset is that views and likes are our attributs of interest so they shouldn't be loaded as row values. Instead, they must be displayed as column names. As a result, we have addressed this issue in the related notebook.

  17. Experimental setup for the proposed system.

    • plos.figshare.com
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Experimental setup for the proposed system. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

  18. G

    Missing Person Detection Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Missing Person Detection Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/missing-person-detection-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Missing Person Detection Market Outlook



    As per our latest research, the global Missing Person Detection market size reached USD 2.14 billion in 2024, with a robust year-on-year growth pattern. The market is set to expand at a CAGR of 14.8% during the forecast period, aiming to achieve a value of USD 6.12 billion by 2033. This growth is primarily driven by the rapid adoption of advanced AI-powered identification technologies, increased government initiatives to improve public safety, and the integration of real-time surveillance systems across multiple sectors.



    One of the primary growth factors propelling the Missing Person Detection market is the significant advancement in artificial intelligence and machine learning. These technologies have revolutionized the way missing persons are located by enabling real-time analysis of vast amounts of surveillance footage, social media data, and biometric records. Law enforcement agencies and search and rescue teams are increasingly leveraging AI-powered facial recognition and video analytics to accelerate identification processes, minimize false positives, and enhance the accuracy of detection. The integration of these technologies into existing surveillance infrastructure has not only improved operational efficiency but also reduced the time required to locate missing individuals, which is critical in time-sensitive rescue missions. Moreover, the continuous evolution of these technologies is expected to further drive market growth by providing more sophisticated and reliable detection solutions.



    Another significant driver for the Missing Person Detection market is the growing collaboration between government agencies, private organizations, and non-governmental organizations (NGOs). Governments worldwide are investing heavily in upgrading their public safety infrastructure, including the deployment of advanced detection systems at airports, train stations, and public events. Additionally, private organizations specializing in security solutions are partnering with law enforcement to provide cutting-edge technologies and services. NGOs focused on human trafficking and child protection are also adopting these systems to enhance their search and rescue capabilities. This multi-stakeholder approach has fostered a robust ecosystem that supports the development and deployment of innovative detection solutions, thereby accelerating market expansion.



    The increasing prevalence of urbanization and large-scale public gatherings has heightened the demand for effective missing person detection solutions. With urban areas becoming more densely populated, the risk of individuals going missing in crowded environments has escalated, necessitating the deployment of advanced surveillance and detection systems. Major sporting events, concerts, and festivals often require real-time monitoring to ensure public safety and quickly respond to missing person incidents. The integration of GPS and geolocation technologies with AI-driven analytics enables authorities to track individuals' movements and respond swiftly to emergencies. This trend is expected to continue as urbanization intensifies and public safety remains a top priority for governments and event organizers alike.



    The concept of Person of Interest Tracking has become increasingly relevant in the context of missing person detection. This approach involves the use of advanced surveillance and data analysis techniques to monitor individuals who may be connected to missing person cases. By leveraging AI-driven analytics and real-time data processing, authorities can track the movements and activities of persons of interest, providing valuable insights that can aid in the swift resolution of cases. This method not only enhances the efficiency of search operations but also helps in identifying patterns and connections that may not be immediately apparent. As technology continues to evolve, Person of Interest Tracking is expected to play a crucial role in the broader strategy of missing person detection, offering a proactive tool for law enforcement agencies worldwide.



    From a regional perspective, North America currently dominates the Missing Person Detection market due to its early adoption of advanced surveillance technologies, strong government initiatives, and a well-established public safety infrastructure. Europe follows closely, driven by stringent regulations on public safety and

  19. Fake News Detection Dataset

    • kaggle.com
    zip
    Updated Apr 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi Mashayekhi (2025). Fake News Detection Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimashayekhi/fake-news-detection-dataset
    Explore at:
    zip(11735585 bytes)Available download formats
    Dataset updated
    Apr 27, 2025
    Authors
    Mahdi Mashayekhi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📚 Fake News Detection Dataset

    Overview

    This dataset is designed for practicing fake news detection using machine learning and natural language processing (NLP) techniques. It includes a rich collection of 20,000 news articles, carefully generated to simulate real-world data scenarios. Each record contains metadata about the article and a label indicating whether the news is real or fake.

    The dataset also intentionally includes around 5% missing values in some fields to simulate the challenges of handling incomplete data in real-life projects.

    Columns Description

    title A short headline summarizing the article (around 6 words). text The body of the news article (200–300 words on average). date The publication date of the article, randomly selected over the past 3 years. source The media source that published the article (e.g., BBC, CNN, Al Jazeera). May contain missing values (~5%). author The author's full name. Some entries are missing (~5%) to simulate real-world incomplete data. category The general category of the article (e.g., Politics, Health, Sports, Technology). label The target label: real or fake news.

    Why Use This Dataset?

    Fake News Detection Practice: Perfect for binary classification tasks.

    NLP Preprocessing: Allows users to practice text cleaning, tokenization, vectorization, etc.

    Handling Missing Data: Some fields are incomplete to simulate real-world data challenges.

    Feature Engineering: Encourages creating new features from text and metadata.

    Balanced Labels: Realistic distribution of real and fake news for fair model training.

    Potential Use Cases

    Building and evaluating text classification models (e.g., Logistic Regression, Random Forests, XGBoost).

    Practicing NLP techniques like TF-IDF, Word2Vec, BERT embeddings.

    Performing exploratory data analysis (EDA) on news data.

    Developing pipelines for dealing with missing values and feature extraction.

    A Note on Data Quality

    This dataset has been synthetically generated to closely resemble real news articles. The diversity in titles, text, sources, and categories ensures that models trained on this dataset can generalize well to unseen, real-world data. However, since it is synthetic, it should not be used for production models or decision-making without careful validation.

    File Info

    Filename: fake_news_dataset.csv

    Size: 20,000 rows × 7 columns

    Missing Data: ~5% missing values in the source and author columns.

  20. d

    Data from: Imperfect detection alters the outcome of management strategies...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Jun 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edd Hammill; Christopher Clement (2021). Imperfect detection alters the outcome of management strategies for protected areas [Dataset]. http://doi.org/10.5061/dryad.stqjq2c0d
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 23, 2021
    Dataset provided by
    Dryad
    Authors
    Edd Hammill; Christopher Clement
    Time period covered
    Jan 20, 2020
    Description

    There are no missing values, the numbers in each column represent the abundance of each species in the entire well.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1

Identifying Missing Data Handling Methods with Text Mining

Explore at:
delimitedAvailable download formats
Dataset updated
Mar 8, 2023
Dataset provided by
Hungarian Academy of Sciences
Authors
Krisztián Boros; Zoltán Kmetty
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Jan 1, 1999 - Dec 31, 2016
Description

Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.

Search
Clear search
Close search
Google apps
Main menu