49 datasets found
  1. m

    Educational Attainment in North Carolina Public Schools: Use of statistical...

    • data.mendeley.com
    Updated Nov 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
    Explore at:
    Dataset updated
    Nov 14, 2018
    Authors
    Scott Herford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    North Carolina
    Description

    The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.

  2. f

    Confusion matrix.

    • figshare.com
    xls
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaoxia Mou; Heming Zhang (2023). Confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0288140.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Shaoxia Mou; Heming Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Due to the inherent characteristics of accumulation sequence of unbalanced data, the mining results of this kind of data are often affected by a large number of categories, resulting in the decline of mining performance. To solve the above problems, the performance of data cumulative sequence mining is optimized. The algorithm for mining cumulative sequence of unbalanced data based on probability matrix decomposition is studied. The natural nearest neighbor of a few samples in the unbalanced data cumulative sequence is determined, and the few samples in the unbalanced data cumulative sequence are clustered according to the natural nearest neighbor relationship. In the same cluster, new samples are generated from the core points of dense regions and non core points of sparse regions, and then new samples are added to the original data accumulation sequence to balance the data accumulation sequence. The probability matrix decomposition method is used to generate two random number matrices with Gaussian distribution in the cumulative sequence of balanced data, and the linear combination of low dimensional eigenvectors is used to explain the preference of specific users for the data sequence; At the same time, from a global perspective, the AdaBoost idea is used to adaptively adjust the sample weight and optimize the probability matrix decomposition algorithm. Experimental results show that the algorithm can effectively generate new samples, improve the imbalance of data accumulation sequence, and obtain more accurate mining results. Optimizing global errors as well as more efficient single-sample errors. When the decomposition dimension is 5, the minimum RMSE is obtained. The proposed algorithm has good classification performance for the cumulative sequence of balanced data, and the average ranking of index F value, G mean and AUC is the best.

  3. M

    Data from: Characterizing and classifying neuroendocrine neoplasms through...

    • datacatalog.mskcc.org
    • data.niaid.nih.gov
    • +2more
    Updated Sep 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nanayakkara, Jina; Yang, Xiaojing; Tyryshkin, Kathrin; Wong, Justin J.M.; Vanderbeck, Kaitlin; Ginter, Paula S.; Scognamiglio, Theresa; Chen, Yao-Tseng; Panarelli, Nicole; Cheung, Nai-Kong; Dijk, Frederike; Ben-Dov, Iddo Z.; Kim, Michelle Kang; Singh, Simron; Morozov, Pavel; Max, Klaas E. A.; Tuschl, Thomas; Renwick, Neil (2023). Characterizing and classifying neuroendocrine neoplasms through microRNA sequencing and data mining [Dataset]. http://doi.org/10.5061/dryad.fn2z34tqj
    Explore at:
    Dataset updated
    Sep 19, 2023
    Dataset provided by
    MSK Library
    Authors
    Nanayakkara, Jina; Yang, Xiaojing; Tyryshkin, Kathrin; Wong, Justin J.M.; Vanderbeck, Kaitlin; Ginter, Paula S.; Scognamiglio, Theresa; Chen, Yao-Tseng; Panarelli, Nicole; Cheung, Nai-Kong; Dijk, Frederike; Ben-Dov, Iddo Z.; Kim, Michelle Kang; Singh, Simron; Morozov, Pavel; Max, Klaas E. A.; Tuschl, Thomas; Renwick, Neil
    Description

    From Dryad entry:

    "Abstract
    Neuroendocrine neoplasms (NENs) are clinically diverse and incompletely characterized cancers that are challenging to classify. MicroRNAs (miRNAs) are small regulatory RNAs that can be used to classify cancers. Recently, a morphology-based classification framework for evaluating NENs from different anatomic sites was proposed by experts, with the requirement of improved molecular data integration. Here, we compiled 378 miRNA expression profiles to examine NEN classification through comprehensive miRNA profiling and data mining. Following data preprocessing, our final study cohort included 221 NEN and 114 non-NEN samples, representing 15 NEN pathological types and five site-matched non-NEN control groups. Unsupervised hierarchical clustering of miRNA expression profiles clearly separated NENs from non-NENs. Comparative analyses showed that miR-375 and miR-7 expression is substantially higher in NEN cases than non-NEN controls. Correlation analyses showed that NENs from diverse anatomic sites have convergent miRNA expression programs, likely reflecting morphologic and functional similarities. Using machine learning approaches, we identified 17 miRNAs to discriminate 15 NEN pathological types and subsequently constructed a multi-layer classifier, correctly identifying 217 (98%) of 221 samples and overturning one histologic diagnosis. Through our research, we have identified common and type-specific miRNA tissue markers and constructed an accurate miRNA-based classifier, advancing our understanding of NEN diversity.

    Methods
    Sequencing-based miRNA expression profiles from 378 clinical samples, comprising 239 neuroendocrine neoplasm (NEN) cases and 139 site-matched non-NEN controls, were used in this study. Expression profiles were either compiled from published studies (n=149) or generated through small RNA sequencing (n=229). Prior to sequencing, total RNA was isolated from formalin-fixed paraffin-embedded (FFPE) tissue blocks or fresh-frozen (FF) tissue samples. Small RNA cDNA libraries were sequenced on HiSeq 2500 Illumina platforms using an established small RNA sequencing (Hafner et al., 2012 Methods) and sequence annotation pipeline (Brown et al., 2013 Front Genet) to generate miRNA expression profiles. Scaling our existing approach to miRNA-based NEN classification (Panarelli et al., 2019 Endocr Relat Cancer; Ren et al., 2017 Oncotarget), we constructed and cross-validated a multi-layer classifier for discriminating NEN pathological types based on selected miRNAs.

    Usage notes
    Diagnostic histopathology and small RNA cDNA library preparation information for all samples are presented in Table S1 of the associated manuscript."

  4. Zenodo Open Metadata snapshot - Training dataset for records classifier...

    • zenodo.org
    application/gzip, bin
    Updated Dec 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Ioannidis; Alex Ioannidis (2022). Zenodo Open Metadata snapshot - Training dataset for records classifier building [Dataset]. http://doi.org/10.5281/zenodo.1255786
    Explore at:
    bin, application/gzipAvailable download formats
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alex Ioannidis; Alex Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains Zenodo's published open access records' metadata, including also records that have been marked by the Zenodo staff as spam and deleted.

    The dataset is a gzipped compressed JSON-lines file, where each line is a JSON object representation of a Zenodo record.

    Each object contains the terms:
    part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date

    which are corresponding to the fields with the same name available in Zenodo's record JSON Schema at https://zenodo.org/schemas/records/record-v1.0.0.json.

    In addition, some terms have been altered:

    The term files contains a list of dictionaries containing filetype, size, and filename only.
    The term license contains a short Zenodo ID of the license (e.g "cc-by").
    The term spam contains a boolean value, determining whether a given record was marked as a spam record by Zenodo staff.

    Some values for the top-level terms, which were missing in the metadata may contain a null value.

    A smaller uncompressed random sample of 200 JSON lines is also included to allow for testing and getting familiar with the format without having to download the entire dataset.

  5. SIAM 2007 Text Mining Competition dataset

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    • +2more
    application/rdfxml +5
    Updated Jun 26, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). SIAM 2007 Text Mining Competition dataset [Dataset]. https://data.nasa.gov/dataset/SIAM-2007-Text-Mining-Competition-dataset/skkr-s98t
    Explore at:
    csv, application/rssxml, json, tsv, application/rdfxml, xmlAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Subject Area: Text Mining

    Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining algorithms for document classification. The documents in question were aviation safety reports that documented one or more problems that occurred during certain flights. The goal was to label the documents with respect to the types of problems that were described. This is a subset of the Aviation Safety Reporting System (ASRS) dataset, which is publicly available.

    How Data Was Acquired: The data for this competition came from human generated reports on incidents that occurred during a flight.

    Sample Rates, Parameter Description, and Format: There is one document per incident. The datasets are in raw text format. All documents for each set will be contained in a single file. Each row in this file corresponds to a single document. The first characters on each line of the file are the document number and a tilde separats the document number from the text itself.

    Anomalies/Faults: This is a document category classification problem.

  6. Malaria disease and grading system dataset from public hospitals reflecting...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Nov 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie (2023). Malaria disease and grading system dataset from public hospitals reflecting complicated and uncomplicated conditions [Dataset]. http://doi.org/10.5061/dryad.4xgxd25gn
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 10, 2023
    Dataset provided by
    Nasarawa State University
    Authors
    Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Malaria is the leading cause of death in the African region. Data mining can help extract valuable knowledge from available data in the healthcare sector. This makes it possible to train models to predict patient health faster than in clinical trials. Implementations of various machine learning algorithms such as K-Nearest Neighbors, Bayes Theorem, Logistic Regression, Support Vector Machines, and Multinomial Naïve Bayes (MNB), etc., has been applied to malaria datasets in public hospitals, but there are still limitations in modeling using the Naive Bayes multinomial algorithm. This study applies the MNB model to explore the relationship between 15 relevant attributes of public hospitals data. The goal is to examine how the dependency between attributes affects the performance of the classifier. MNB creates transparent and reliable graphical representation between attributes with the ability to predict new situations. The model (MNB) has 97% accuracy. It is concluded that this model outperforms the GNB classifier which has 100% accuracy and the RF which also has 100% accuracy. Methods Prior to collection of data, the researcher was be guided by all ethical training certification on data collection, right to confidentiality and privacy reserved called Institutional Review Board (IRB). Data was be collected from the manual archive of the Hospitals purposively selected using stratified sampling technique, transform the data to electronic form and store in MYSQL database called malaria. Each patient file was extracted and review for signs and symptoms of malaria then check for laboratory confirmation result from diagnosis. The data was be divided into two tables: the first table was called data1 which contain data for use in phase 1 of the classification, while the second table data2 which contains data for use in phase 2 of the classification. Data Source Collection Malaria incidence data set is obtained from Public hospitals from 2017 to 2021. These are the data used for modeling and analysis. Also, putting in mind the geographical location and socio-economic factors inclusive which are available for patients inhabiting those areas. Naive Bayes (Multinomial) is the model used to analyze the collected data for malaria disease prediction and grading accordingly. Data Preprocessing: Data preprocessing shall be done to remove noise and outlier. Transformation: The data shall be transformed from analog to electronic record. Data Partitioning The data which shall be collected will be divided into two portions; one portion of the data shall be extracted as a training set, while the other portion will be used for testing. The training portion shall be taken from a table stored in a database and will be called data which is training set1, while the training portion taking from another table store in a database is shall be called data which is training set2. The dataset was split into two parts: a sample containing 70% of the training data and 30% for the purpose of this research. Then, using MNB classification algorithms implemented in Python, the models were trained on the training sample. On the 30% remaining data, the resulting models were tested, and the results were compared with the other Machine Learning models using the standard metrics. Classification and prediction: Base on the nature of variable in the dataset, this study will use Naïve Bayes (Multinomial) classification techniques; Classification phase 1 and Classification phase 2. The operation of the framework is illustrated as follows: i. Data collection and preprocessing shall be done. ii. Preprocess data shall be stored in a training set 1 and training set 2. These datasets shall be used during classification. iii. Test data set is shall be stored in database test data set. iv. Part of the test data set must be compared for classification using classifier 1 and the remaining part must be classified with classifier 2 as follows: Classifier phase 1: It classify into positive or negative classes. If the patient is having malaria, then the patient is classified as positive (P), while a patient is classified as negative (N) if the patient does not have malaria.
    Classifier phase 2: It classify only data set that has been classified as positive by classifier 1, and then further classify them into complicated and uncomplicated class label. The classifier will also capture data on environmental factors, genetics, gender and age, cultural and socio-economic variables. The system will be designed such that the core parameters as a determining factor should supply their value.

  7. f

    Results for Random Forest classification models using different feature sets...

    • plos.figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janna Axenbeck; Patrick Breithaupt (2023). Results for Random Forest classification models using different feature sets and target variables. [Dataset]. http://doi.org/10.1371/journal.pone.0249583.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Janna Axenbeck; Patrick Breithaupt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Evaluation metrics are presented for the test sample.

  8. d

    Application of image processing and machine learning techniques to...

    • search.dataone.org
    • data.griidc.org
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daly, Kendra (2025). Application of image processing and machine learning techniques to distinguish suspected oil droplets from plankton and other particles for the SIPPER imaging system [Dataset]. http://doi.org/10.7266/N74X55RS
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    GRIIDC
    Authors
    Daly, Kendra
    Description

    Image classification features and examples of statistical results for the data mining approach using a one-versus-one strategy to implement a SVM (support vector machine) multi-class classifier. Data published in: Fefilatyev, S., K. Kramer, L. Hall, D. Goldgof, R. Kasturi, A. Remsen, K. Daly. 2011. Detection of Anomalous Particles from the Deepwater Horizon Oil Spill Using the SIPPER3 Underwater Imaging Platform. Proceedings of International Conference on Data Mining Workshops, p. 741-748. Awarded Data Mining Practice Prize at the IEEE International Conference on Data Mining (ICDM), Vancouver, Canada, December 11-14, 2011. DOI 10.1109/ICDMW.2011.65.

  9. Healthcare datasets.

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Talayeh Razzaghi; Oleg Roderick; Ilya Safro; Nicholas Marko (2023). Healthcare datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0155119.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Talayeh Razzaghi; Oleg Roderick; Ilya Safro; Nicholas Marko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The set “Example 1” has 10000 observations in each class. In set “Example 2”, the majority and minority classes contain 50400, and 33600 observations, respectively. For details about the data see [8].

  10. m

    Lisbon, Portugal, hotel’s customer dataset with three years of personal,...

    • data.mendeley.com
    Updated Nov 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nuno Antonio (2020). Lisbon, Portugal, hotel’s customer dataset with three years of personal, behavioral, demographic, and geographic information [Dataset]. http://doi.org/10.17632/j83f5fsh6c.1
    Explore at:
    Dataset updated
    Nov 18, 2020
    Authors
    Nuno Antonio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Portugal, Lisbon
    Description

    Hotel customer dataset with 31 variables describing a total of 83,590 instances (customers). It comprehends three full years of customer behavioral data. In addition to personal and behavioral information, the dataset also contains demographic and geographical information. This dataset contributes to reducing the lack of real-world business data that can be used for educational and research purposes. The dataset can be used in data mining, machine learning, and other analytical field problems in the scope of data science. Due to its unit of analysis, it is a dataset especially suitable for building customer segmentation models, including clustering and RFM (Recency, Frequency, and Monetary value) models, but also be used in classification and regression problems.

  11. EthanolLevel UCR Archive Dataset

    • data.niaid.nih.gov
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Southampton (2024). EthanolLevel UCR Archive Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11190984
    Explore at:
    Dataset updated
    May 15, 2024
    Dataset provided by
    University of Californiahttp://universityofcalifornia.edu/
    University of Southampton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the UCR Archive maintained by University of Southampton researchers. Please cite a relevant or the latest full archive release if you use the datasets. See http://www.timeseriesclassification.com/.

    This dataset is part of the project with Scotch Whisky Research Institute into detecting forged spirits in a non-intrusive manner. One way of detecting forgery without sampling the wine is through inspecting ethanol level by spectrograph. The dataset covers 20 different bottle types and four levels of alcohol: 35%, 38%, 40% and 45%. Each series is a spectrograph of 1751 observations. This dataset is an example of when it is wrong to merge and resample, because the train/test split are constructed so that the same bottle type is never in both train and test sets. There are 4 classes. - Class 1: E35 - Class 2: E38 - Class 3: E40 - Class 4: E45 For more information about this dataset, see [1,2].

    [1] Lines, Jason, Sarah Taylor, and Anthony Bagnall. "Hive-cote: The hierarchical vote collective of transformation-based ensembles for time series classification." Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 2016.

    [2] J. Large, E. K. Kemsley, N.Wellner, I. Goodall, and A. Bagnall, Detecting forged alcohol non-invasively through vibrational spectroscopy and machine learning," in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2018.

    Donator: A. Bagnall

  12. Sensitivity, specificity and G-mean of financial risk problem with five risk...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Talayeh Razzaghi; Oleg Roderick; Ilya Safro; Nicholas Marko (2023). Sensitivity, specificity and G-mean of financial risk problem with five risk classes (Example 1) using ML(W)SVM and REM imputation methods. [Dataset]. http://doi.org/10.1371/journal.pone.0155119.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Talayeh Razzaghi; Oleg Roderick; Ilya Safro; Nicholas Marko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sensitivity, specificity and G-mean of financial risk problem with five risk classes (Example 1) using ML(W)SVM and REM imputation methods.

  13. SAT Questions and Answers for LLM 🏛️

    • kaggle.com
    Updated Oct 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). SAT Questions and Answers for LLM 🏛️ [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/sat-history-questions-and-answers/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    SAT History Questions and Answers 🏛️ - Text Classification Dataset

    This dataset contains a collection of questions and answers for the SAT Subject Test in World History and US History. Each question is accompanied by a corresponding answers and the correct response.

    The dataset includes questions from various topics, time periods, and regions on both World History and US History.

    💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

    OTHER DATASETS FOR THE TEXT ANALYSIS:

    Content

    For each question, we extracted: - id: number of the question, - subject: SAT subject (World History or US History), - prompt: text of the question, - A: answer A, - B: answer B, - C: answer C, - D: answer D, - E: answer E, - answer: letter of the correct answer to the question

    💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

    TrainingData provides high-quality data annotation tailored to your needs

    keywords: answer questions, sat, gpa, university, school, exam, college, web scraping, parsing, online database, text dataset, sentiment analysis, llm dataset, language modeling, large language models, text classification, text mining dataset, natural language texts, nlp, nlp open-source dataset, text data, machine learning

  14. Character classification data for license plates

    • figshare.com
    txt
    Updated Mar 13, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Rawat; M.T. Manry; Fernando Martinez (2016). Character classification data for license plates [Dataset]. http://doi.org/10.6084/m9.figshare.3113449.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 13, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rohit Rawat; M.T. Manry; Fernando Martinez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Licence Plate Character Classification DataAuthors:Rohit Rawat, Dr. M. T. Manry, Fernando MartinezImage Processing and Neural Networks LabThe University of Texas at Arlingtonhttp://www.uta.edu/faculty/manry/This dataset has 49 numerical features extracted from character images extracted from license plate images. The dataset has 12757 images extracted from plate images split into training and testing sets. The data has 36 output classes belonging to letters 'A' to 'Z' excluding the characters 'O' and 'Q', numbers '0' to '9', and two state map characters.Data is tab separated, one line per example, with the correct class between 1 and 36 at the end of the line.This data should be cited as:Rawat, Rohit; Manry, M.T.; Martinez, Fernando (2016): Character classification data for license plates. figshare. https://dx.doi.org/10.6084/m9.figshare.3113449.v1

  15. T

    ag_news_subset

    • tensorflow.org
    Updated Dec 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ag_news_subset [Dataset]. http://identifiers.org/arxiv:1509.01626
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .

    The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

    The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('ag_news_subset', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  16. d

    Rock mass quality and structural geology observations in northwest Prince...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Rock mass quality and structural geology observations in northwest Prince William Sound, Alaska from the summer of 2021 [Dataset]. https://catalog.data.gov/dataset/rock-mass-quality-and-structural-geology-observations-in-northwest-prince-william-sound-al
    Explore at:
    Dataset updated
    Jul 31, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Prince William Sound, Alaska
    Description

    Multiple subaerial landslides adjacent to Prince William Sound, Alaska (for example, Dai and others, 2020; Higman and others, 2023; Schaefer and others, 2024) pose a threat to the public because of their potential to generate ocean waves (Dai and others, 2020; Barnhart and others, 2021; Barnhart and others, 2022) that could impact towns and marine activities. One bedrock landslide on the west side of Barry Arm fjord drew international attention in 2020 because of its large size (~500 M m3) and tsunamigenic potential (Dai and others, 2020). As part of the U.S. Geological Survey response to the detection of the potentially tsunamigenic landslide at Barry Arm, as well as a broader effort to evaluate bedrock landslide and tsunamigenic potential throughout Prince William Sound (for example, Schaefer and others, 2024), we assessed rock mass quality and collected structural geology data in a large part of northwest Prince William Sound (including Barry Arm) in June and July, 2021. The quality (strength) of a rock mass depends on the properties of intact rock and the characteristics of discontinuities (for example, bedding, fractures, cleavage) that cut the rock. Rock mass quality can be estimated in the field using a variety of classification schemes. In the summer of 2021, most of our fieldwork was boat-based and was therefore conducted at sites along the coastline. A small number of sites in and near Barry Arm were accessed by helicopter, and sites near the town of Whittier were accessed by driving and hiking. At each field site, we made our measurements at rock outcrops, which were typically found at the base of cliffs, along ridge lines, in flat areas in coastal zones, and in areas recently scoured and plucked by glaciers. In two dimensions, outcrops ranged in size from about 30 m2 to 100 m2. We visited a total of 73 sites in the field. Most sites were in metamorphosed Cretaceous flysch, but a few were in Tertiary granitic rocks (Nelson and others, 1985; Winkler, 1992; Wilson and others, 2015). Of the 73 sites, we collected rock mass quality data and structural data at 54 sites, and only strike and dip of bedding in flysch at 19 sites. At each of the 54 sites, we collected data that we later used to classify rock mass quality according to four commonly used classification schemes; Rock Mass Quality (Q, for example, Barton and others, 1974, Coe and others, 2005); Rock Mass Rating (RMR, for example, Bieniawski, 1989); Slope Mass Rating (SMR, for example, Romana, 1995, Moore and others, 2009) and Geologic Strength Index (GSI, for example, Marinos and Hoek, 2000, Marinos and others, 2005). We also determined Rock Quality Designation (RQD, for example, Deere and Deere, 1989, Palmström, 1982) and estimated intact rock strength using a Proceq Rock Schmidt Type N hammer (see RatingsReadMe.pdf for details). Schmidt hammer rebound values were converted to Uniaxial Compressive Strength (UCS) using equations developed for the same rock types that we observed in the field, but at different locations. For flysch, rebound values from the Type N Schmidt hammer were converted to UCS by first converting Type N rebound values to Type L rebound values, then using these Type L values in the equation shown in Table 3 and Figure 3 of Morales and others (2004). For granitic rocks, UCS values were calculated using Type N rebound values in equation 2 of Katz and others (2000). Additionally, we collected strikes and dips of any observed bedding, fractures, and cleavage. All four rock mass quality classification schemes use data from characteristics of discontinuities present in the rock. Discontinuity data that we collected in the field included: total number of discontinuities, roughness of the surface of the discontinuities, number of sets of discontinuities, type of filling or alteration on the surface of discontinuities, aperture or “openness” of discontinuities, and the amount of water present. A file of a blank field data collection sheet (FieldDataCollectionSheet) is included in this data release. Numerical ratings for each of these factors are assigned based on the correlation of field measurements and observations with descriptive rankings. The rankings used for Q, RMR, SMR, and GSI classification schemes are shown in Table 1, Table 2, Table 3, and Figures 1 and 2. Additional details regarding descriptive rankings and numerical ratings not shown in the tables and figures are provided in the RatingsReadMe.pdf. All field measurements, numerical ranking values, and calculated Q, RMR, SMR, GSI, and RQD values are included in the RMQMeasurements_Ratings_Values2021 file (.csv and .xlsx). Site names beginning with “JAC”, followed by numbers, are locations where both rock mass quality and structural data were collected. Site names beginning with “JACSD”, “srl”, and “fault” are locations where only the strike and dip of bedding was measured. Question marks in the data files indicate a lack of certainty in field observations. Abbreviations of rating parameters (for example, R4e, Jw, etc.) for the RMR, SMR, and Q classification systems used in column headings are defined in more detail in Tables 1 and 2. All structural measurements are provided in the StructuralData2021 file (.csv and .xlsx). The planar and toppling calculations used for determining SMR values are included in the SMRCalculationsWorksheet2021 file (.csv and .xlsx). Final Q, RMR, SMR, GSI, and RQD values for each site are presented in a separate file (FinalRockStength_QualityValues2021, .csv and .xlsx). All rock mass quality values are positively correlated with rock quality. That is, as Q, RMR, SMR, GSI, and RQD values increase, rock quality increases. Additional information in this release includes photos, field sketches, and geographic data. Photos from each site are included in a separate folder (2021PhotosbySiteName), organized by the individual site names and the names of the photographers. Field sketches for eight sites are in a SketchesinFieldNotesbySiteName zipped folder. A Google Earth 2021SiteLocations.kml file showing site locations, site names, and geographic coordinates is also included. Samples of rock were collected at some of the 2021 sites in the summer of 2022. These sample names are noted in a column in the RMQMeasurements_Rating_Values2021 file. Physcial samples are held by Lauren N. Schaefer with the U.S. Geological Survey, Geologic Hazards Science Center in Golden, Colorado. Disclaimer: Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. References Barton, N., Lien, R., and Lunde, J., 1974, Engineering classification of rock masses for the design of tunnel support: Rock Mechanics, v. 6, p. 189-236. https://doi.org/10.1007/BF01239496 Barnhart, K.R., Jones, R.P., George, D.L., Coe, J.A., and Staley, D.M., 2021, Preliminary assessment of the wave generating potential from landslides at Barry Arm, Prince William Sound, Alaska: U.S. Geological Survey Open-File Report 2021–1071, 28 p., https://doi.org/10.3133/ofr20211071. Barnhart, K.R., Collins, A. L., Avdievitch, N. N., Jones, R.P., George, D.L., Coe, J.A., and Staley, D.M., 2022, Simulated inundation extent and depth in Harriman Fjord and Barry Arm, western Prince William Sound, Alaska resulting from the hypothetical rapid motion of landslides into Barry Arm Fjord, Prince William Sound, Alaska: U.S. Geological Survey data release, https://doi.org/10.5066/P9QGWH9Z Bieniawski, Z.T., 1989, Engineering rock mass classifications a complete manual for engineers and geologist in mining, civil, and petroleum engineering: John Wiley & Sons, New York, 251 p. Coe, J.A., Harp, E.L., Tarr, A.C., and Michael, J.A., 2005, Rock-fall hazard assessment of Little Mill campground, American Fork Canyon, Uinta National Forest, Utah: U.S. Geological Survey Open File Report 2005-1229, 48 p., two 1:3000-scale plates. http://pubs.usgs.gov/of/2005/1229/ Dai, C., Higman, B., Lynett, P. J., Jacquemart, M., Howat, I. M., Liljedahl, A. K., Dufresne, A., Freymueller, J.T., Geertsema, M., Ward Jones, M., and Haeussler, P.J., 2020, Detection and assessment of a large and potentially tsunamigenic periglacial landslide in Barry Arm, Alaska. Geophysical Research Letters, v. 47 (22), e2020GL089800. https://doi.org/10.1029/2020GL089800 Deere, D.U., and Deere, D.W., 1989, Rock Quality Designation (RQD) after twenty years: Contract Report GL-89-1, U.S. Army Engineer Waterways Experiment Station, Vicksburg, Miss., 25 p. Higman, B., Lahusen, S.R., Belair, G.M., Staley, D.M., and Jacquemart, M., 2023, Inventory of Large Slope Instabilities, Prince William Sound, Alaska: U.S. Geological Survey data release, https://doi.org/10.5066/P9XGMHHP Katz, O., Reches, Z., and Roegiers, J.-C., 2000, Evaluation of mechanical rock properties using a Schmidt hammer: International Journal of Rock Mechanics and Mining Sciences, v. 37, p. 723-728. https://doi.org/10.1016/S1365-1609(00)00004-6 Marinos, P., and Hoek, E., 2000, GSI: a geologically friendly tool for rock mass strength estimation. In: Proceedings of the GeoEng2000 at the international conference on geotechnical and geological engineering, Melbourne, Technomic publishers, Lancaster, pp. 1422–1446. Marinos, V., Marinos, P., and Hoek, E., 2005, The geological strength index: applications and limitations: Bulletin of Engineering Geology and the Environment, v. 64, p. 55-65 https://doi.org/10.1007/s10064-004-0270-5 Moore, J.R., Sandrers, J.W., Dietrich, W.E., and Glaser S.D., 2009, Influence of rock mass strength on the erosion rate of alpine cliffs: Earth Surface Processes and Landforms, v. 34, p. 1339-1352. https://doi.org/10.1002/esp.1821 Morales, T., Uribe-Etxebarria, G., Uriarte, J.A., and Fernández de Valderrama, I., 2004, Geomechanical characterisation of rock masses in Alpine regions: the Basque Arc (Basque-Cantabrian basin, Northern Spain): Engineering Geology, v. 71, p. 343–362.

  17. m

    ChemTables Sample: dataset for table classification in chemical patents

    • data.mendeley.com
    Updated Nov 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenan Zhai (2020). ChemTables Sample: dataset for table classification in chemical patents [Dataset]. http://doi.org/10.17632/g7tjh7tbrj.1
    Explore at:
    Dataset updated
    Nov 4, 2020
    Authors
    Zenan Zhai
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Chemical patents are a commonly used channel for disclosing novel compounds and reactions, and hence represent important resources for chemical and pharmaceutical research. Key chemical data in patents is often presented in tables. Both the number and the size of the tables can be very large in patent documents. In addition, various types of information can be presented in tables in patents, including spectroscopic and physical data, or pharmacological use and effects of chemicals. Categorisation of tables based on the nature of their content can help to support finding tables containing key information, improving the accessibility of information in patents that is highly relevant for new inventions. To enable the research on methods for automatic table categorization, we developed a new dataset, called ChemTables, which consists of 7,886 chemical patent tables with labels of their content type. This sample is 10% of the created ChemTables dataset. We also provide a stratified 60:20:20 split for train/dev/test set here, which can be used as a standard split for evaluating methods on table categorization task on this dataset.

  18. d

    Classification of Swift and XMM-Newton sources - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Oct 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Classification of Swift and XMM-Newton sources - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/4bc3af0b-902a-5251-9c17-b9ffd6721a01
    Explore at:
    Dataset updated
    Oct 23, 2023
    Description

    With the advent of very large X-ray surveys, an automated classification of X-ray sources becomes increasingly valuable. This work proposes a revisited naive Bayes classification of the X-ray sources in the Swift-XRT and XMM- Newton catalogs into four classes - AGN, stars, X-ray binaries (XRBs), and cataclysmic variables (CVs) - based on their spatial, spectral, and timing properties and their multiwavelength counterparts. An outlier measure is used to identify objects of other natures. The classifier is optimized to maximize the classification performance of a chosen class (here XRBs), and it is adapted to data mining purposes. We augmented the X-ray catalogs with multiwavelength data, source class, and variability properties. We then built a reference sample of about 25000 X-ray sources of known nature. From this sample, the distribution of each property was carefully estimated and taken as reference to assign probabilities of belonging to each class. The classification was then performed on the whole catalog, combining the information from each property. Using the algorithm on the Swift reference sample, we retrieved 99%, 98%, 92%, and 34% of AGN, stars, XRBs, and CVs, respectively, and the false positive rates are 3%, 1%, 9%, and 15%. Similar results are obtained on XMM sources. When applied to a carefully selected test sample, representing 55% of the X-ray catalog, the classification gives consistent results in terms of distributions of source properties. A substantial fraction of sources not belonging to any class is efficiently retrieved using the outlier measure, as well as AGN and stars with properties deviating from the bulk of their class. Our algorithm is then compared to a random forest method; the two showed similar performances, but the algorithm presented in this paper improved insight into the grounds of each classification. This robust classification method can be tailored to include additional or different source classes and can be applied to other X-ray catalogs. The transparency of the classification compared to other methods makes it a useful tool in the search for homogeneous populations or rare source types, including multi-messenger events. Such a tool will be increasingly valuable with the development of surveys of unprecedented size, such as LSST, SKA, and Athena, and the search for counterparts of multi-messenger events.

  19. f

    Data from: New Variable Selection Method Using Interval Segmentation Purity...

    • figshare.com
    • acs.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li-Juan Tang; Wen Du; Hai-Yan Fu; Jian-Hui Jiang; Hai-Long Wu; Guo-Li Shen; Ru-Qin Yu (2023). New Variable Selection Method Using Interval Segmentation Purity with Application to Blockwise Kernel Transform Support Vector Machine Classification of High-Dimensional Microarray Data [Dataset]. http://doi.org/10.1021/ci900032q.s001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Li-Juan Tang; Wen Du; Hai-Yan Fu; Jian-Hui Jiang; Hai-Long Wu; Guo-Li Shen; Ru-Qin Yu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant, or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. A new method for key gene selection has been proposed on the basis of interval segmentation purity that is defined as the purity of samples belonging to a certain class in intervals segmented by a mode search algorithm. This method identifies key variables most discriminative for each class, which offers possibility of unraveling the biological implication of selected genes. A salient advantage of the new strategy over existing methods is the capability of selecting genes that, though possibly exhibit a multimodal distribution, are the most discriminative for the classes of interest, considering that the expression levels of some genes may reflect systematic difference in within-class samples derived from different pathogenic mechanisms. On the basis of the key genes selected for individual classes, a support vector machine with block-wise kernel transform is developed for the classification of different classes. The combination of the proposed gene mining approach with support vector machine is demonstrated in cancer classification using two public data sets. The results reveal that significant genes have been identified for each class, and the classification model shows satisfactory performance in training and prediction for both data sets.

  20. U

    Riverine Sand Mining/Scofield Island Restoration (BA-40): 2018 habitat...

    • data.usgs.gov
    • catalog.data.gov
    Updated Nov 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Holly Beck; Hana Thurman; Nicholas Enwright; Jason Dugas; Wyatt Cheney (2021). Riverine Sand Mining/Scofield Island Restoration (BA-40): 2018 habitat classification, detailed habitat classes [Dataset]. http://doi.org/10.5066/P97NSPBM
    Explore at:
    Dataset updated
    Nov 19, 2021
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Holly Beck; Hana Thurman; Nicholas Enwright; Jason Dugas; Wyatt Cheney
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Dec 22, 2018
    Description

    The Barrier Island Comprehensive Monitoring (BICM) program was developed by Louisiana’s Coastal Protection and Restoration Authority (CPRA) and is implemented as a component of the System Wide Assessment and Monitoring Program (SWAMP). The program uses both historical data and contemporary data collections to assess and monitor changes in the aerial and subaqueous extent of islands, habitat types, sediment texture and geotechnical properties, environmental processes, and vegetation composition. Examples of BICM datasets include still and video aerial photography for documenting shoreline changes, shoreline positions, habitat mapping, land change analyses, light detection and ranging (lidar) surveys for topographic elevations, single-beam and swath bathymetry, and sediment grab samples. For more information about the BICM program, see Kindinger and others (2013). The U.S. Geological Survey, Wetland and Aquatic Research Center provides support to the BICM program through the develop ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1

Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets.

Explore at:
Dataset updated
Nov 14, 2018
Authors
Scott Herford
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
North Carolina
Description

The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.

Search
Clear search
Close search
Google apps
Main menu