100+ datasets found
  1. Data Classification Market Size & Share Analysis - Industry Research Report...

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence (2025). Data Classification Market Size & Share Analysis - Industry Research Report - Growth Trends 2030 [Dataset]. https://www.mordorintelligence.com/industry-reports/data-classification-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2019 - 2030
    Area covered
    Global
    Description

    The Data Classification Market Report is Segmented by Component (Software and Services), Classification Method (Content-Based, Context-Based, and More), Organization Size (Large Enterprises and Small and Medium Enterprises (SMEs)), Application (Access Control and IAM, Governance and Compliance, and More), Industry Vertical (BFSI, and More), and Geography. The Market Forecasts are Provided in Terms of Value (USD).

  2. Data Classification Market by Component (Solution, Services), Methodology...

    • verifiedmarketresearch.com
    pdf,excel,csv,ppt
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verified Market Research (2024). Data Classification Market by Component (Solution, Services), Methodology (Content-based, Context-based, User-based), Application (Access Control, GRC, Web, Mobile & Email Protection, Centralized Management), End-User Industry (Banking, Financial Services & Insurance, Healthcare & Life Sciences, Government & Defense, Education, Telecom, Media & Entertainment), & Region for 2026-2032 [Dataset]. https://www.verifiedmarketresearch.com/product/data-classification-market/
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Aug 12, 2024
    Dataset authored and provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2026 - 2032
    Area covered
    Global
    Description

    Data Classification Market size was valued at USD 1664.66 Million in 2024 and is projected to reach USD 9486.25 Million by 2032, growing at a CAGR of 24.3% during the forecast period 2026-2032.

    Global Data Classification Market Drivers

    The market drivers for the Data Classification Market can be influenced by various factors. These may include:

    Increasing Data Volume: In order to maintain data security, compliance, and effective use, there is an increasing requirement to manage and classify the data produced by enterprises in an exponentially growing amount. Regulatory Compliance: Organizations must categorize their data based on the sensitivity levels required by strict data protection laws like the GDPR, CCPA, HIPAA, and others. Adoption of data classification solutions is driven by compliance requirements, which guarantee adherence to regulatory standards and prevent heavy penalties.

    Data Security Concerns: Organizations are concentrating on strengthening their data security procedures due to the increase in cyber threats and data breaches. Classifying data makes it easier to find sensitive information and implement the right security measures to keep it safe from theft or unwanted access.

    Growing Adoption of Cloud Services: As cloud computing services become more widely used, strong data classification techniques are required to guarantee data security and compliance, particularly when data is transferred between different cloud environments and storage locations. Increasing Awareness of Data Privacy: The need for solutions that allow for better management and protection of sensitive data through classification and encryption is being driven by heightened awareness of data privacy issues among consumers and enterprises. Combining Data Loss Prevention (DLP) Systems: Through the identification, monitoring, and prevention of sensitive information leakage or unlawful transfer, data categorization integrated with DLP systems improves data protection capabilities. Emergence of AI and Machine Learning Technologies: By incorporating these technologies into data categorization systems, data may be identified and classified more automatically and accurately, saving labor and increasing efficiency. Demand for Data Governance and Lifecycle Management: In order to maintain data quality, integrity, and compliance throughout its lifecycle, organizations are realizing more and more how important it is to have effective data governance and lifecycle management. A key component of putting into practice efficient data governance procedures is data classification.

  3. Best machine learning based classification results (according to average...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivo D. Dinov; Ben Heavner; Ming Tang; Gustavo Glusman; Kyle Chard; Mike Darcy; Ravi Madduri; Judy Pa; Cathie Spino; Carl Kesselman; Ian Foster; Eric W. Deutsch; Nathan D. Price; John D. Van Horn; Joseph Ames; Kristi Clark; Leroy Hood; Benjamin M. Hampstead; William Dauer; Arthur W. Toga (2023). Best machine learning based classification results (according to average measures of 5-fold cross-validation). [Dataset]. http://doi.org/10.1371/journal.pone.0157077.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ivo D. Dinov; Ben Heavner; Ming Tang; Gustavo Glusman; Kyle Chard; Mike Darcy; Ravi Madduri; Judy Pa; Cathie Spino; Carl Kesselman; Ian Foster; Eric W. Deutsch; Nathan D. Price; John D. Van Horn; Joseph Ames; Kristi Clark; Leroy Hood; Benjamin M. Hampstead; William Dauer; Arthur W. Toga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Best machine learning based classification results (according to average measures of 5-fold cross-validation).

  4. f

    Gender Distributions.

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xls
    Updated Sep 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivo D. Dinov; Ben Heavner; Ming Tang; Gustavo Glusman; Kyle Chard; Mike Darcy; Ravi Madduri; Judy Pa; Cathie Spino; Carl Kesselman; Ian Foster; Eric W. Deutsch; Nathan D. Price; John D. Van Horn; Joseph Ames; Kristi Clark; Leroy Hood; Benjamin M. Hampstead; William Dauer; Arthur W. Toga (2016). Gender Distributions. [Dataset]. http://doi.org/10.1371/journal.pone.0157077.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 28, 2016
    Dataset provided by
    PLOS ONE
    Authors
    Ivo D. Dinov; Ben Heavner; Ming Tang; Gustavo Glusman; Kyle Chard; Mike Darcy; Ravi Madduri; Judy Pa; Cathie Spino; Carl Kesselman; Ian Foster; Eric W. Deutsch; Nathan D. Price; John D. Van Horn; Joseph Ames; Kristi Clark; Leroy Hood; Benjamin M. Hampstead; William Dauer; Arthur W. Toga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gender Distributions.

  5. G

    Data Classification Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Classification Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-classification-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Aug 23, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Classification Market Outlook



    According to our latest research, the global Data Classification market size reached USD 1.92 billion in 2024, with a robust year-over-year growth rate. The market is projected to expand at a CAGR of 23.4% from 2025 to 2033, positioning it to reach a forecasted value of USD 13.34 billion by 2033. The primary growth driver for this market is the accelerating adoption of advanced data security solutions across industries, as organizations seek to comply with stringent data privacy regulations and mitigate the risks associated with data breaches.



    The increasing frequency and sophistication of cyber threats have made data classification a critical component of enterprise security strategies. Organizations are prioritizing the deployment of data classification solutions to identify, categorize, and protect sensitive information, ensuring that only authorized personnel have access to critical data assets. This shift is further fueled by the proliferation of cloud computing and digital transformation initiatives, which have led to exponential growth in data volumes and complexity. As a result, the demand for automated and scalable data classification tools is surging, enabling businesses to maintain visibility and control over their data in real time.



    Another significant growth factor is the evolving regulatory landscape, with governments and industry bodies around the world introducing rigorous data protection laws such as GDPR, CCPA, and HIPAA. Compliance with these regulations necessitates robust data classification frameworks to accurately assess and report on the handling of personally identifiable information (PII) and other sensitive data types. Enterprises are increasingly investing in data classification solutions to avoid severe penalties, enhance audit readiness, and demonstrate accountability in their data management practices. This trend is particularly pronounced in highly regulated sectors such as BFSI, healthcare, and government, where the stakes for data protection are exceptionally high.



    The integration of artificial intelligence and machine learning into data classification platforms is also propelling market growth. These technologies enable more accurate and efficient classification by automating the identification of sensitive data patterns, reducing manual intervention, and minimizing the risk of human error. AI-driven solutions can adapt to evolving data environments and emerging threats, offering predictive analytics and real-time insights that empower organizations to make informed security decisions. This technological advancement is expected to further accelerate the adoption of data classification tools across diverse industry verticals.



    Regionally, North America remains the dominant market for data classification, accounting for the largest share in 2024, followed closely by Europe and the Asia Pacific. The United States, in particular, exhibits strong demand due to the presence of major technology companies, a mature cybersecurity ecosystem, and stringent regulatory requirements. Meanwhile, the Asia Pacific region is experiencing the fastest growth, driven by rapid digitalization, increasing cybercrime incidents, and growing awareness of data privacy issues among enterprises. Latin America and the Middle East & Africa are also witnessing steady adoption, albeit at a comparatively nascent stage, as organizations in these regions ramp up their investments in data security infrastructure.





    Component Analysis



    The Data Classification market is segmented by component into Software and Services, each playing a pivotal role in the overall ecosystem. Software solutions dominate the market, accounting for a substantial portion of the total revenue. These solutions are designed to automate the identification, labeling, and categorization of data based on predefined policies and rules. The evolution of software offerings has been marked by the integration of advanced analytics, machine learning, and artificial intelligence

  6. s

    Citation Trends for "Medical data classification scheme based on hybridized...

    • shibatadb.com
    Updated Apr 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2017). Citation Trends for "Medical data classification scheme based on hybridized SMOTE technique (HST) and Rough Set technique (RST)" [Dataset]. https://www.shibatadb.com/article/6SEJEsMa
    Explore at:
    Dataset updated
    Apr 15, 2017
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2019 - 2025
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Medical data classification scheme based on hybridized SMOTE technique (HST) and Rough Set technique (RST)".

  7. Z

    Data from: Reviewing ensemble classification methods in breast cancer...

    • data.niaid.nih.gov
    • portalinvestigacion.um.es
    • +1more
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hosni, Mohamed; Abnane, Ibtissam; Idri, Ali; Carrillo de Gea, Juan Manuel; Fernández-Alemán, José Luis (2025). Reviewing ensemble classification methods in breast cancer (DATASET) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14767927
    Explore at:
    Dataset updated
    Jan 29, 2025
    Dataset provided by
    University of Murcia
    Universidad de Murcia
    University Mohammed V of Rabat
    Authors
    Hosni, Mohamed; Abnane, Ibtissam; Idri, Ali; Carrillo de Gea, Juan Manuel; Fernández-Alemán, José Luis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data mapping questions addressed by the selected studies. (MQ7 has 3 sub-questions: D: dataset, V: validation method, M: metrics)

    The data is presented in a pdf file: MappingQuestions.pdf. Contains the list of responses to the mapping questions.

    Contact InformationFor further information or inquiries about this dataset, please contact [Juan Manuel Carrillo de Gea] at [jmcdg1@um.es].

  8. d

    Simulated Hierarchical Benchmark Dataset to assess dendro-classification...

    • search.dataone.org
    • doi.pangaea.de
    Updated Feb 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schoening, Timm; Schütt, Andrea; GEOMAR - Helmholtz Centre for Ocean Research Kiel (2018). Simulated Hierarchical Benchmark Dataset to assess dendro-classification methods (hierarchical classification) [Dataset]. http://doi.org/10.1594/PANGAEA.884173
    Explore at:
    Dataset updated
    Feb 14, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Schoening, Timm; Schütt, Andrea; GEOMAR - Helmholtz Centre for Ocean Research Kiel
    Description

    A hierarchically ordered distribution of 3D-points was created with matlab. It contains 120,000 datapoints in five hierarchical levels with one to four child nodes per parent. Data values for the three axes range betwwen 0 and 1. The structure can be seen in the attached figure. In each hierarchical level different distributions of datapoints are implemented. This allows to test classifiers under various conditions. The most common distribution in the dataset is a simple gaussian distributed point cloud. Other sampled distributions are a spherical distribution (sphere in 3D), or a circular (donut) distribution along different axes. XOR distributions are implemented in different patterns, e.g. four batches with crossed classes or eight batches with two or four classes. The most complex data distribution is the springroll, where the datapoints are intertwined into one another. To create indistinguishable cases, where the prediction of a classifier is supposed to perform bad, some datapoints are just randomly intermixed with another class.

    The .csv-file contains four columns: label | x-coordinate | y-coordinate | z-coordinate

    The label for each sample provides all hierarchical information needed. Each label is composed of five digits, one for each hierarchical level. As an example:

    Sample '11421': Hierarchical level 1: class 1 Hierarchical level 2: class 1 Hierarchical level 3: class 4 Hierarchical level 4: class 2 Hierarchical level 5: class 1

  9. d

    Pseudo-Label Generation for Multi-Label Text Classification

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Pseudo-Label Generation for Multi-Label Text Classification [Dataset]. https://catalog.data.gov/dataset/pseudo-label-generation-for-multi-label-text-classification
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

  10. G

    Data Classification Software Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Classification Software Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-classification-software-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Classification Software Market Outlook



    According to our latest research, the global Data Classification Software market size reached USD 2.47 billion in 2024, reflecting robust adoption across industries. The market is projected to expand at a CAGR of 20.6% from 2025 to 2033, reaching a substantial USD 16.12 billion by 2033. This remarkable growth is primarily driven by the increasing prioritization of data security, stringent regulatory requirements, and the surge in digital transformation initiatives across diverse sectors.




    One of the most significant growth factors propelling the Data Classification Software market is the escalating frequency and sophistication of cyber threats. As organizations generate and store massive volumes of sensitive data, the risk of data breaches and unauthorized access has intensified. Data classification solutions enable enterprises to categorize information based on sensitivity and compliance requirements, thereby implementing more effective security protocols. This capability is especially crucial in industries such as BFSI, healthcare, and government, where the protection of confidential data is paramount. The growing awareness about the financial and reputational repercussions of data breaches has compelled organizations to invest in advanced data classification tools, further fueling market expansion.




    Another key driver is the evolving regulatory landscape. Governments and regulatory bodies worldwide are introducing stringent data protection regulations, such as GDPR in Europe, CCPA in California, and similar frameworks in Asia Pacific. These regulations mandate robust data governance practices, including the accurate classification and management of sensitive information. Non-compliance can result in hefty fines and legal consequences, which has heightened the urgency for businesses to deploy comprehensive data classification software. The need to demonstrate compliance and maintain audit readiness is pushing organizations to adopt solutions that streamline data discovery, classification, and policy enforcement processes.




    The rapid digitalization of business operations and the proliferation of cloud computing are also contributing significantly to the growth of the Data Classification Software market. As enterprises migrate data to cloud environments and embrace hybrid work models, the complexity of data management increases. Data classification software provides the necessary visibility and control over data assets, regardless of where they reside. This empowers organizations to enforce consistent data protection policies across on-premises and cloud infrastructures. Additionally, the integration of artificial intelligence and machine learning technologies into data classification solutions is enhancing their accuracy and scalability, making them indispensable tools in the modern data security arsenal.




    From a regional perspective, North America continues to dominate the Data Classification Software market, accounting for the largest share in 2024, driven by the presence of major technology vendors, early adoption of advanced security solutions, and stringent regulatory frameworks. Europe follows closely, with strong demand fueled by GDPR compliance requirements. The Asia Pacific region is witnessing the fastest growth, propelled by rapid digitalization, increasing cyber threats, and evolving regulatory standards. Latin America and the Middle East & Africa are also showing steady adoption, albeit at a slower pace, as organizations in these regions gradually recognize the importance of robust data governance and security measures.





    Component Analysis



    The Data Classification Software market is segmented by component into software and services, each playing a pivotal role in the overall value proposition. The software segment encompasses standalone data classification platforms and integrated solutions that offer automated data discovery, classification, and policy enforcement. These platforms are increasingly leveraging artificial intell

  11. Outline of the core study designs and data characteristics.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivo D. Dinov; Ben Heavner; Ming Tang; Gustavo Glusman; Kyle Chard; Mike Darcy; Ravi Madduri; Judy Pa; Cathie Spino; Carl Kesselman; Ian Foster; Eric W. Deutsch; Nathan D. Price; John D. Van Horn; Joseph Ames; Kristi Clark; Leroy Hood; Benjamin M. Hampstead; William Dauer; Arthur W. Toga (2023). Outline of the core study designs and data characteristics. [Dataset]. http://doi.org/10.1371/journal.pone.0157077.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ivo D. Dinov; Ben Heavner; Ming Tang; Gustavo Glusman; Kyle Chard; Mike Darcy; Ravi Madduri; Judy Pa; Cathie Spino; Carl Kesselman; Ian Foster; Eric W. Deutsch; Nathan D. Price; John D. Van Horn; Joseph Ames; Kristi Clark; Leroy Hood; Benjamin M. Hampstead; William Dauer; Arthur W. Toga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Outline of the core study designs and data characteristics.

  12. Data from: color classification

    • kaggle.com
    zip
    Updated Apr 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aydin Ayanzadeh (2018). color classification [Dataset]. https://www.kaggle.com/ayanzadeh93/color-classification
    Explore at:
    zip(169343980 bytes)Available download formats
    Dataset updated
    Apr 20, 2018
    Authors
    Aydin Ayanzadeh
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Introduction

    Color classification is an important application that is used in many areas. For example, systems that perform daily. SVM classifier with an optimal hyperplane life analysis can benefit from this classification process. For the classification process, lots of classification algorithms can be used. Among them, the most popular machine learning algorithms are neural networks, decision trees, k-nearest neighbors, Bayes network, support vector machines. In this work for training, SVMs are used and a classifier model was tried to be obtained. SVMs algorithm is one of the supervised learning methods. SVM calls for solutions to regression and classification problems as in all supervised learning methods. This algorithm is usually used to training for separate and classify different labeled samples. As a result of training with SVM, it is aimed to create an optimum hyperplane and classify the data in different classes. This hyperplane is located as far away from the data as possible to avoid error conditions.

    Dataset

    The datasets have contained about 80 images for trainset datasets for whole color classes and 90 images for the test set. colors which are prepared for this application is y yellow, black, white, green, red, orange, blue a and violet. In this implementation, basic colors are preferred for classification. and created a dataset containing images of these basic colors. The dataset also includes masks for all images. we create these masks by binarizing the image. we did the masking on the images I collected and painted the pixels belonging to the class color to white and remaining pixels to the black color.

  13. i

    The Data of Short Text Classification

    • ieee-dataport.org
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zan Qiu (2025). The Data of Short Text Classification [Dataset]. https://ieee-dataport.org/documents/data-short-text-classification
    Explore at:
    Dataset updated
    Oct 29, 2025
    Authors
    Zan Qiu
    Description

    In the experiments conducted in this paper

  14. Pseudo-Label Generation for Multi-Label Text Classification - Dataset - NASA...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Pseudo-Label Generation for Multi-Label Text Classification - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/pseudo-label-generation-for-multi-label-text-classification
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

  15. d

    Data from: MULTI-TEMPORAL REMOTE SENSING IMAGE CLASSIFICATION - A MULTI-VIEW...

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). MULTI-TEMPORAL REMOTE SENSING IMAGE CLASSIFICATION - A MULTI-VIEW APPROACH [Dataset]. https://catalog.data.gov/dataset/multi-temporal-remote-sensing-image-classification-a-multi-view-approach
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    MULTI-TEMPORAL REMOTE SENSING IMAGE CLASSIFICATION - A MULTI-VIEW APPROACH VARUN CHANDOLA AND RANGA RAJU VATSAVAI Abstract. Multispectral remote sensing images have been widely used for automated land use and land cover classification tasks. Often thematic classification is done using single date image, however in many instances a single date image is not informative enough to distinguish between different land cover types. In this paper we show how one can use multiple images, collected at different times of year (for example, during crop growing season), to learn a better classifier. We propose two approaches, an ensemble of classifiers approach and a co-training based approach, and show how both of these methods outperform a straightforward stacked vector approach often used in multi-temporal image classification. Additionally, the co-training based method addresses the challenge of limited labeled training data in supervised classification, as this classification scheme utilizes a large number of unlabeled samples (which comes for free) in conjunction with a small set of labeled training data.

  16. f

    Data from: Comparison of classification algorithms.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pálková, Martina; Apeltauer, Tomáš; Uhlík, Ondřej (2024). Comparison of classification algorithms. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001388071
    Explore at:
    Dataset updated
    Jan 18, 2024
    Authors
    Pálková, Martina; Apeltauer, Tomáš; Uhlík, Ondřej
    Description

    Machine learning methods and agent-based models enable the optimization of the operation of high-capacity facilities. In this paper, we propose a method for automatically extracting and cleaning pedestrian traffic detector data for subsequent calibration of the ingress pedestrian model. The data was obtained from the waiting room traffic of a vaccination center. Walking speed distribution, the number of stops, the distribution of waiting times, and the locations of waiting points were extracted. Of the 9 machine learning algorithms, the random forest model achieved the highest accuracy in classifying valid data and noise. The proposed microscopic calibration allows for more accurate capacity assessment testing, procedural changes testing, and geometric modifications testing in parts of the facility adjacent to the calibrated parts. The results show that the proposed method achieves state-of-the-art performance on a violent-flows dataset. The proposed method has the potential to significantly improve the accuracy and efficiency of input model predictions and optimize the operation of high-capacity facilities.

  17. Plant Growth Data Classification

    • kaggle.com
    zip
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gortorozyannnn (2024). Plant Growth Data Classification [Dataset]. https://www.kaggle.com/datasets/gorororororo23/plant-growth-data-classification
    Explore at:
    zip(4561 bytes)Available download formats
    Dataset updated
    Jul 10, 2024
    Authors
    gortorozyannnn
    Description
    Content

    This "Plant Growth Data Classification" dataset, the prediction task would typically involve predicting or classifying the growth milestone of plants based on the provided environmental and management factors. Specifically, you would aim to predict the growth stage or milestone that a plant reaches based on variables such as soil type, sunlight hours, water frequency, fertilizer type, temperature, and humidity. This prediction can help in understanding how different conditions influence plant growth and can be valuable for optimizing agricultural practices or greenhouse management.

    Here about the description of the columns
    • Soil_Type: The type or composition of soil in which the plants are grown.
    • Sunlight_Hours: The duration or intensity of sunlight exposure received by the plants.
    • Water_Frequency: How often the plants are watered, indicating the watering schedule.
    • Fertilizer_Type: The type of fertilizer used for nourishing the plants.
    • Temperature: The ambient temperature conditions under which the plants are grown.
    • Humidity: The level of moisture or humidity in the environment surrounding the plants.
    • Growth_Milestone: Descriptions or markers indicating stages or significant events in the growth process of the plants.
  18. a

    i15 Crop Mapping 2022 Provisional

    • cnra-gis-open-data-staging-cnra.hub.arcgis.com
    Updated Mar 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos.Lewis@water.ca.gov_DWR (2024). i15 Crop Mapping 2022 Provisional [Dataset]. https://cnra-gis-open-data-staging-cnra.hub.arcgis.com/items/af03260dc1304199aca6fb9109a8f900
    Explore at:
    Dataset updated
    Mar 1, 2024
    Dataset authored and provided by
    Carlos.Lewis@water.ca.gov_DWR
    Area covered
    Description

    2022 STATEWIDE CROP MAPPING - PROVISIONALLand use data is critically important to the work of the Department of Water Resources (DWR) and other California agencies. Understanding the impacts of land use, crop location, acreage, and management practices on environmental attributes and resource management is an integral step in the ability of Groundwater Sustainability Agencies (GSAs) to produce Groundwater Sustainability Plans (GSPs) and implement projects to attain sustainability. Land IQ was contracted by DWR to develop a comprehensive and accurate spatial land use database for the 2022 water year (WY 2022). The primary objective of this effort was to produce a spatial land use database with accuracies exceeding 95% using remote sensing, statistical, and temporal analysis methods. This project is an extension of the 2014, 2016, 2018, 2019, 2020 and 2021 land use mapping, which classified over 14 million acres of land into irrigated agriculture and urban area. Unlike the 2014 and 2016 datasets, the WY 2018, 2019, 2020, 2021 and 2022 datasets include multi-cropping and incorporates DWR ground-truth data from Siskiyou, Modoc, Lassen and Shasta counties. Land IQ integrated crop production knowledge with detailed ground truth information and multiple satellite and aerial image resources to conduct remote sensing land use analysis at the field scale. Individual fields (boundaries of homogeneous crop types representing cropped area, rather than legal parcel boundaries) were classified using a crop category legend and a more specific crop type legend. A supervised classification method using a random forest approach was used to classify delineated fields and was carried out county by county where training samples were available. Random forest approaches are currently some of the highest performing methods for data classification and regression. To determine frequency and seasonality of multiple-cropped fields, peak growth dates were determined for annual crops. Fields were attributed with DWR crop categories and included citrus/subtropical, deciduous fruits and nuts, field crops, grain and hay, idle, pasture, rice, truck crops, urban, vineyards, and young perennials. These categories represent aggregated groups of specific crop types in the Land IQ dataset. Accuracy was calculated for the crop mapping using both DWR and Land IQ crop legends. The overall accuracy result for the crop mapping statewide was 98.1% at the DWR Class level and 96.7% at the DWR Subclass level. Accuracy and error results varied among crop types. In particular, some less extensive crops that have very few validation samples may have a skewed accuracy result depending on the number and nature of validation sample points. DWR revised crops and conditions from the Land IQ classification were encoded using standard DWR land use codes added to feature attributes, and each modified classification is indicated by the value 'r' in the ‘DWR_REVISE' data field. Polygons drawn by DWR, not included in Land IQ dataset receive the 'n' code for new. Boundary change (i.e. DWR changed the boundary that LIQ delivered could be split boundary) indicated by 'b'. Each polygon classification is consistent with DWR attribute standards, however some of DWR's traditional attribute definitions are modified and extended to accommodate unavoidable constraints within remote-sensing classifications, or to make data more specific for DWR's water balance computation needs. The original Land IQ classifications reported for each polygon are preserved for comparison, and are also expressed as DWR standard attributes. Comments, problems, improvements, updates, or suggestions about local conditions or revisions in the final data set should be forwarded to the appropriate Regional Office Senior Land Use Supervisor. Revisions were made if: - DWR corrected the original crop classification based on local knowledge and analysis, -PARTIALLY IRRIGATED CROPS Crops irrigated for only part of their normal irrigation season were given the special condition of ‘X’, -In certain areas, DWR changed the irrigation status to irrigated or non-irrigated. Among those areas the special condition may have been changed to 'Partially Irrigated' based on image analysis and local knowledge, - young versus mature stages of perennial orchards and vineyards were identified (DWR added ‘Young’ to Special Condition attributes), - DWR determined that a field originally classified ‘Idle’ was actually cropped one or more times during the year, - the percent of cropped area was changed from the original acres reported by Land IQ (values indicated in DWR ‘Percent’ column), - DWR determined that the field boundary should have been split to better reflect separate crops within the same polygon and identified by a 'b' in the DWR_REVISED column, - The ‘Mixed’ was added to the MULTIUSE column refers to no boundary change, but percent of field is changed where more than one crop is found, - DWR identified a distinct early or late crop on the field before the main season crop (‘Double’ was added to the MULTIUSE column); if the 1st and 2nd sequential crops occupied different portions of the total field acreage, the area percentages were indicated for each crop). This dataset includes multicropped fields. If the field was determined to have more than one crop during the course of the water year, the order of the crops is sequential, beginning with Class 1. All single cropped fields will be placed in Class 2, so every polygon will have a crop in the Class 2 and CropType2 columns. In the case that a permanent crop was removed during the water year, the Class 2 crop will be the permanent crop followed by ‘X’ – Unclassified fallow in the Class 3 column. In the case of Intercropping, the main crop will be placed in the Class 2 column with the partial crop in the Class 3 column. The column 'MAIN_CROP' was added in 2019 and has been continued through the 2022 dataset. This column indicates which field Land IQ identified as the main season crop for the water year representing the crop grown during the dominant growing season for each county. The column ‘MAIN_CROP_DATE’, continued in the 2022 dataset, indicates the NDVI peak date for this main season crop. Asterisks (* or **) in attribute table indicates no data have been collected for that specific attribute.This provisional metadata does not contain the full metadata per the California Department of Water Resources (DWR) Spatial Data Standards. DWR reviewed and revised the data in some cases. The associated data are considered DWR enterprise GIS data, which meet all appropriate requirements of the DWR Spatial Data Standards, specifically the DWR Spatial Data Standard version 3.5, dated March 22, 2023. This dataset was not produced by DWR. Data were originally developed and supplied by Land IQ, LLC, under contract to California Department of Water Resources. Comments, problems, improvements, updates, or suggestions should be forwarded to LandUse@water.ca.gov.Prior to WY 2021 final mapping release, pasture areas that where mechanically harvested during a water year were classified as P6-Miscellaneous Grasses. Starting with the WY 2021 final mapping release and moving forward these harvested pasture areas are classified as P3-Mixed Pasture.

  19. d

    Data from: Classification of crop types in central California from 2005 -...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Classification of crop types in central California from 2005 - 2020 [Dataset]. https://catalog.data.gov/dataset/classification-of-crop-types-in-central-california-from-2005-2020
    Explore at:
    Dataset updated
    Oct 1, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Central California, California
    Description

    This dataset is support materials for the publication "Crop type classification, trends, and patterns of central California agricultural fields from 2005 – 2020". This data release is comprised of two child datasets. The first dataset, 'Labeled_CropType_Points', is a shapefile that consists of randomly selected point locations in which crop types were verified using high resolution imagery for each examined year across the study period (2005 - 2020). The second dataset, 'Central_CA_Classified_Croplands', is also a shapefile, but contains polygons of 9 classified crop types derived from a random forest machine learning classifier for central California for each examined year across the study period (2005 - 2020).

  20. G

    Data Classification and Labeling for Gov Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Classification and Labeling for Gov Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-classification-and-labeling-for-gov-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Oct 7, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Classification and Labeling for Government Market Outlook



    According to our latest research, the global Data Classification and Labeling for Government market size reached USD 1.72 billion in 2024, and is expected to grow at a robust CAGR of 18.4% during the forecast period, reaching approximately USD 8.13 billion by 2033. This significant growth is primarily driven by the increasing need for robust data security frameworks and compliance requirements across government agencies worldwide. The surge in cyber threats, the proliferation of sensitive digital records, and tightening regulatory mandates are compelling governments to invest heavily in advanced data classification and labeling solutions.




    One of the primary growth factors fueling the Data Classification and Labeling for Government market is the escalating sophistication of cyber-attacks targeting public sector data repositories. Government agencies, which often handle highly sensitive citizen data, national security information, and confidential policy documents, are increasingly prioritizing the implementation of data classification and labeling tools to proactively identify, categorize, and secure critical information assets. The rapid digital transformation in the public sector, combined with a heightened focus on data privacy and sovereignty, is further accelerating the adoption of these solutions. Additionally, the rise of remote work and cloud adoption within government entities has exposed new vulnerabilities, necessitating innovative approaches to data governance and risk management.




    Another significant driver is the evolving regulatory landscape, which mandates stringent compliance with data protection laws such as the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and various national cybersecurity frameworks. Government organizations are under increasing pressure to demonstrate accountability, transparency, and due diligence in handling sensitive data. Data classification and labeling technologies enable these agencies to automate compliance workflows, streamline audit processes, and ensure the proper handling of classified information. The growing emphasis on digital trust and the need to safeguard national interests are pushing governments to adopt advanced data governance strategies, thereby propelling market growth.




    The integration of artificial intelligence (AI) and machine learning (ML) into data classification and labeling platforms is also a pivotal growth catalyst. Modern solutions leverage AI-driven algorithms to enhance the accuracy and efficiency of data categorization, automate repetitive tasks, and provide real-time insights into data usage patterns. This technological advancement is enabling government agencies to manage exponentially growing data volumes more effectively, minimize human error, and reduce operational costs. Furthermore, the increasing collaboration between public sector organizations and technology vendors is fostering innovation in data security infrastructure, creating a fertile environment for the expansion of the Data Classification and Labeling for Government market.




    From a regional perspective, North America currently dominates the market, accounting for the largest share in 2024, owing to substantial investments in cybersecurity, a mature regulatory environment, and the presence of leading technology providers. Europe follows closely, driven by strict data protection regulations and a strong focus on digital sovereignty. The Asia Pacific region is witnessing the fastest growth, attributed to rapid digitalization initiatives, increasing government IT spending, and rising awareness around data privacy. Latin America and the Middle East & Africa are also emerging as promising markets, supported by ongoing digital government projects and the need to address evolving cyber threats. These regional dynamics are expected to shape the competitive landscape and growth trajectory of the global market through 2033.





    Component Analysis


    <br /

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mordor Intelligence (2025). Data Classification Market Size & Share Analysis - Industry Research Report - Growth Trends 2030 [Dataset]. https://www.mordorintelligence.com/industry-reports/data-classification-market
Organization logo

Data Classification Market Size & Share Analysis - Industry Research Report - Growth Trends 2030

Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Mordor Intelligence
License

https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

Time period covered
2019 - 2030
Area covered
Global
Description

The Data Classification Market Report is Segmented by Component (Software and Services), Classification Method (Content-Based, Context-Based, and More), Organization Size (Large Enterprises and Small and Medium Enterprises (SMEs)), Application (Access Control and IAM, Governance and Compliance, and More), Industry Vertical (BFSI, and More), and Geography. The Market Forecasts are Provided in Terms of Value (USD).

Search
Clear search
Close search
Google apps
Main menu