2 datasets found

Data from: Pseudo-Label Generation for Multi-Label Text Classification
data.nasa.gov
datasets.ai
+2more
application/rdfxml +5
Updated Jun 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Pseudo-Label Generation for Multi-Label Text Classification [Dataset]. https://data.nasa.gov/dataset/Pseudo-Label-Generation-for-Multi-Label-Text-Class/m7vn-45yf
Explore at:
csv, application/rssxml, tsv, xml, json, application/rdfxmlAvailable download formats
Dataset updated
Jun 26, 2018
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.
g
Pseudo-Label Generation for Multi-Label Text Classification | gimi9.com
gimi9.com
Updated Nov 1, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). Pseudo-Label Generation for Multi-Label Text Classification | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_pseudo-label-generation-for-multi-label-text-classification
Explore at:
Dataset updated
Nov 1, 2011
Description
With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2018). Pseudo-Label Generation for Multi-Label Text Classification [Dataset]. https://data.nasa.gov/dataset/Pseudo-Label-Generation-for-Multi-Label-Text-Class/m7vn-45yf

Data from: Pseudo-Label Generation for Multi-Label Text Classification

Explore at:

csv, application/rssxml, tsv, xml, json, application/rdfxmlAvailable download formats

Dataset updated

Jun 26, 2018

License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Description

With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

Clear search

Close search

Google apps

Main menu

Data from: Pseudo-Label Generation for Multi-Label Text Classification

Pseudo-Label Generation for Multi-Label Text Classification | gimi9.com

Data from: Pseudo-Label Generation for Multi-Label Text Classification