2 datasets found

SIAM 2007 Text Mining Competition dataset
data.staging.idas-ds1.appdat.jsc.nasa.gov
data.nasa.gov
+2more
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). SIAM 2007 Text Mining Competition dataset [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/siam-2007-text-mining-competition-dataset
Explore at:
Dataset updated
Feb 19, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining algorithms for document classification. The documents in question were aviation safety reports that documented one or more problems that occurred during certain flights. The goal was to label the documents with respect to the types of problems that were described. This is a subset of the Aviation Safety Reporting System (ASRS) dataset, which is publicly available. How Data Was Acquired: The data for this competition came from human generated reports on incidents that occurred during a flight. Sample Rates, Parameter Description, and Format: There is one document per incident. The datasets are in raw text format. All documents for each set will be contained in a single file. Each row in this file corresponds to a single document. The first characters on each line of the file are the document number and a tilde separats the document number from the text itself. Anomalies/Faults: This is a document category classification problem.
g
SIAM 2007 Text Mining Competition dataset | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SIAM 2007 Text Mining Competition dataset | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_siam-2007-text-mining-competition-dataset
Explore at:
Description
Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining algorithms for document classification. The documents in question were aviation safety reports that documented one or more problems that occurred during certain flights. The goal was to label the documents with respect to the types of problems that were described. This is a subset of the Aviation Safety Reporting System (ASRS) dataset, which is publicly available. How Data Was Acquired: The data for this competition came from human generated reports on incidents that occurred during a flight. Sample Rates, Parameter Description, and Format: There is one document per incident. The datasets are in raw text format. All documents for each set will be contained in a single file. Each row in this file corresponds to a single document. The first characters on each line of the file are the document number and a tilde separats the document number from the text itself. Anomalies/Faults: This is a document category classification problem.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

nasa.gov (2025). SIAM 2007 Text Mining Competition dataset [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/siam-2007-text-mining-competition-dataset

SIAM 2007 Text Mining Competition dataset

Explore at:

25 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 19, 2025

Dataset provided by

NASAhttp://nasa.gov/

Description

Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining algorithms for document classification. The documents in question were aviation safety reports that documented one or more problems that occurred during certain flights. The goal was to label the documents with respect to the types of problems that were described. This is a subset of the Aviation Safety Reporting System (ASRS) dataset, which is publicly available. How Data Was Acquired: The data for this competition came from human generated reports on incidents that occurred during a flight. Sample Rates, Parameter Description, and Format: There is one document per incident. The datasets are in raw text format. All documents for each set will be contained in a single file. Each row in this file corresponds to a single document. The first characters on each line of the file are the document number and a tilde separats the document number from the text itself. Anomalies/Faults: This is a document category classification problem.

Clear search

Close search

Google apps

Main menu

SIAM 2007 Text Mining Competition dataset

SIAM 2007 Text Mining Competition dataset | gimi9.com

SIAM 2007 Text Mining Competition dataset