30 datasets found
  1. Z

    Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minh-Son-Phan (2024). Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5978939
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Jean-Yves Tinevez
    Minh-Son-Phan
    Guillaume Jacquemet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains data used to illustrate the utility of Weka detector in TrackMate.

    • classifier.model: trained Weka classifier.
    • image data: human dermal microvascular blood endothelial cells expressing GFP-paxillin

    More detail on using these files can be found here: https://imagej.net/plugins/trackmate/trackmate-weka.

  2. Data from: COVID-19 and media dataset: Mining textual data according periods...

    • dataverse.cirad.fr
    application/x-gzip +1
    Updated Dec 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathieu Roche; Mathieu Roche (2020). COVID-19 and media dataset: Mining textual data according periods and countries (UK, Spain, France) [Dataset]. http://doi.org/10.18167/DVN1/ZUA8MF
    Explore at:
    application/x-gzip(511157), application/x-gzip(97349), text/x-perl-script(4982), application/x-gzip(93110), application/x-gzip(23765310), application/x-gzip(107669)Available download formats
    Dataset updated
    Dec 21, 2020
    Authors
    Mathieu Roche; Mathieu Roche
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Spain, United Kingdom, France
    Dataset funded by
    ANR (#DigitAg)
    Horizon 2020 - European Commission - (MOOD project)
    Description

    These datasets contain a set of news articles in English, French and Spanish extracted from Medisys (i‧e. advanced search) according the following criteria: (1) Keywords (at least): COVID-19, ncov2019, cov2019, coronavirus; (2) Keywords (all words): masque (French), mask (English), máscara (Spanish) (3) Periods: March 2020, May 2020, July 2020; (4) Countries: UK (English), Spain (Spanish), France (French). A corpus by country has been manually collected (copy/paste) from Medisys. For each country, 100 snippets by period (the 1st, 10th, 15th, 20th for each month) are built. The datasets are composed of: (1) A corpus preprocessed for the BioTex tool - https://gitlab.irstea.fr/jacques.fize/biotex_python (.txt) [~ 900 texts]; (2) The same corpus preprocessed for the Weka tool - https://www.cs.waikato.ac.nz/ml/weka/ (.arff); (3) Terms extracted with BioTex according spatio-temporal criteria (*.csv) [~ 9000 terms]. Other corpora can be collected with this same method. The code in Perl in order to preprocess textual data for terminology extraction (with BioTex) and classification (with Weka) tasks is available. A new version of this dataset (December 2020) includes additional data: - Python preprocessing and BioTex code [Execution_BioTex‧tgz]. - Terms extracted with different ranking measures (i‧e. C-Value, F-TFIDF-C_M) and methods (i‧e. extraction of words and multi-word terms) with the online version of BioTex [Terminology_with_BioTex_online_dec2020.tgz],

  3. Tracking focal adhesions with TrackMate and Weka - tutorial dataset 1

    • zenodo.org
    • data.niaid.nih.gov
    bin, png
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jean-Yves Tinevez; Jean-Yves Tinevez; Minh-Son-Phan; Minh-Son-Phan; Guillaume Jacquemet; Guillaume Jacquemet (2024). Tracking focal adhesions with TrackMate and Weka - tutorial dataset 1 [Dataset]. http://doi.org/10.5281/zenodo.5226842
    Explore at:
    bin, pngAvailable download formats
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jean-Yves Tinevez; Jean-Yves Tinevez; Minh-Son-Phan; Minh-Son-Phan; Guillaume Jacquemet; Guillaume Jacquemet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains data used to illustrate the utility of Weka detector in TrackMate.

    - classifier.model: trained Weka classifier.
    - MDA231 paxillin DMSO 1 min.czi - MDA231 paxillin DMSO 1 min.czi #01_t1_t40_crop.tif: example image.

    More detail on using these files can be found here: https://imagej.net/plugins/trackmate/trackmate-weka.

  4. I

    Words_Selected_by_Information_Gain

    • databank.illinois.edu
    Updated Jan 2, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaoru Dong; Jingyi Xie; Linh Hoang (2019). Words_Selected_by_Information_Gain [Dataset]. http://doi.org/10.13012/B2IDB-9837167_V1
    Explore at:
    Dataset updated
    Jan 2, 2019
    Authors
    Xiaoru Dong; Jingyi Xie; Linh Hoang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    U.S. National Institutes of Health (NIH)
    Description

    File Name: WordsSelectedByInformationGain.csv Data Preparation: Xiaoru Dong, Linh Hoang Date of Preparation: 2018-12-12 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: the file contains a list of 1655 informative words selected by applying information gain feature selection strategy. Information gain is one of the methods commonly used for feature selection, which tells us how many bits of information the presence of the word are helpful for us to predict the classes, and can be computed in a specific formula [Jurafsky D, Martin JH. Speech and language processing. London: Pearson; 2014 Dec 30].We ran Information Gain feature selection on Weka -- a machine learning tool. Notes: In order to reproduce the data in this file, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.

  5. Z

    Supporting datasets PubFig05 for: "Heterogeneous Ensemble Combination Search...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haque, Mohammad Nazmul (2020). Supporting datasets PubFig05 for: "Heterogeneous Ensemble Combination Search using Genetic Algorithm for Class Imbalanced Data Classification" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_33539
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Noman, Nasimul
    Haque, Mohammad Nazmul
    Berratta, Regina
    Moscato, Pablo
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Faces Dataset: PubFig05

    This is a subset of the ''PubFig83'' dataset [1] which provides 100 images each of 5 most difficult celebrities to recognise (referred as class in the classification problem). For each celebrity persons, we took 100 images and separated them into training and testing sets of 90 and 10 images, respectively:

    Person: Jenifer Lopez; Katherine Heigl; Scarlett Johansson; Mariah Carey; Jessica Alba

    Feature Extraction

    To extract features from images, we have applied the HT-L3-model as described in [2] and obtained 25600 features.

    Feature Selection

    Details about feature selection followed in brief as follows:

    Entropy Filtering: First we apply an implementation of Fayyad and Irani's [3] entropy base heuristic to discretise the dataset and discarded features using the minimum description length (MDL) principle and only 4878 passed this entropy based filtering method.

    Class-Distribution Balancing: Next, we have converted the dataset to binary-class problem by separating into 5 binary-class datasets using one-vs-all setup. Hence, these datasets became imbalanced at a ratio of 1:4. Then we converted them into balanced binary-class datasets using random sub-sampled method. Further processing of the dataset has been described in the paper.

    (alpha,beta)-k Feature selection: To get a good feature set for training the classifier, we select the features using the approach based on the (alpha,beta)-k feature selection [4] problem. It selects a minimum subset of features that maximise both within class similarity and dissimilarity in different classes. We applied the entropy filtering and (alpha,beta)-k feature subset selection methods in three ways and obtained different numbers of features (in the Table below) after consolidating them into binary class dataset.

    UAB: We applied (alpha,beta)-k feature set method on each of the balanced binary-class datasets and we took the union of selected features for each binary-class datasets. Finally, we applied the (alpha,beta)-k feature set selection method on each of the binary-class datasets and get a set of features.

    IAB: We applied (alpha,beta)-k feature set method on each of the balanced binary-class datasets and we took the intersection of selected features for each binary-class datasets. Finally, we applied the (alpha,beta)-k feature set selection method on each of the binary-class datasets and get a set of features.

    UEAB: We applied (alpha,beta)-k feature set method on each of the balanced binary-class datasets. Then, we applied the entropy filtering and (alpha,beta)-k feature set selection method on each of the balanced binary-class datasets. Finally, we took the union of selected features for each balanced binary-class datasets and get a set of features.

    All of these datasets are inside the compressed folder. It also contains the document describing the process detail.

    References

    [1] Pinto, N., Stone, Z., Zickler, T., & Cox, D. (2011). Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on (pp. 35–42).

    [2] Cox, D., & Pinto, N. (2011). Beyond simple features: A large-scale feature search approach to unconstrained face recognition. In Automatic Face Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on (pp. 8–15).

    [3] Fayyad, U. M., & Irani, K. B. (1993). Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In International Joint Conference on Artificial Intelligence (pp. 1022–1029).

    [4] Berretta, R., Mendes, A., & Moscato, P. (2005). Integer programming models and algorithms for molecular classification of cancer from microarray data. In Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38 (pp. 361–370). 1082201: Australian Computer Society, Inc.

  6. Z

    Data from: Machine Learning Models and New Computational Tool for the...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martinez-Rios (2022). Machine Learning Models and New Computational Tool for the Discovery of Insect Repellents that Interfere with Olfaction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6677764
    Explore at:
    Dataset updated
    Jun 22, 2022
    Dataset provided by
    Garcia-Jacas
    Marrero-Ponce
    Pulgar-Sánchez
    Martinez-Rios
    Hernández-Lambraño
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SI1_Supporting Information file (docx) brings together detailed information on the outstanding models obtained for each dataset analyzed in this study such as statistical and training parameters and outliers. There can be found the responses in spikes/s of the mosquito Culex quinquefasciatus to the 50 IRs. Besides, there is presented a full table of the up-to-date studies related to QSAR and insect repellency.

    SI2_EXP1_50IRs from Liu et al (2013) SDF file presents the structures of each of the 50 IRs analyzed.

    SI3_EXP2_Datasets gathers the four datasets as SDF files from Oliferenko et al. (2013), Gaudin et al. (2008), Omolo et al. (2004), and Paluch et al. (2009) used for the repellency modeling in EXP2.

    SI4_EXP3_Prospective analysis provides Malaria Box Library (400 compounds) as an SDF file, which were analyzed in our virtual screening to prospect potential virtual hits.

    SI5_QuBiLS-MIDAS MDs lists contain three TXT lists of 3D molecular descriptors used in QuBiLS-MIDAS to describe the molecules used in the present study.

    SI6_EXP1_Sensillar Modeling comprises two subfolders: Classification and Regression models for each of the six sensilla. Models built to predict the physiological interaction experimentally obtained from Liu et al. (2013). All of the models are implemented in the software SiLiS-PAPACS. Every single folder compiles a DOCX file with the detailed description of the model, an XLSX file with the output obtained from the training in Weka 3.9.4, an ARFF, and CSV files with the MDs for each molecule, and the SDF of the study dataset.

    SI7_EXP2_Repellency Modeling encompasses the four datasets in the study: Oliferenko et al. (2013), Gaudin et al. (2008), Omolo et al. (2004), and Paluch et al. (2009). Inside the subfolders, there are three models per type of MDs (duplex, triple, generic, and mix) selected that best predict each dataset. As well as the SI6 folder, each model includes six files: DOCX, XLSX, ARFF, CSV, and an SDF.

    SI8_Virtual Hits includes the cluster analysis results and physico-chemical properties of new IR virtual leads.

  7. Classifier result.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo (2023). Classifier result. [Dataset]. http://doi.org/10.1371/journal.pone.0241701.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classifier result.

  8. Experiments results.

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo (2023). Experiments results. [Dataset]. http://doi.org/10.1371/journal.pone.0241701.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Experiments results.

  9. f

    LDA attributes.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo (2023). LDA attributes. [Dataset]. http://doi.org/10.1371/journal.pone.0241701.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LDA attributes.

  10. f

    Best parameter values.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo (2023). Best parameter values. [Dataset]. http://doi.org/10.1371/journal.pone.0241701.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Best parameter values.

  11. f

    Accuracy, sensitivity, specificity and F-score.

    • figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Solomon Shiferaw Beyene; Tianyi Ling; Blagoj Ristevski; Ming Chen (2023). Accuracy, sensitivity, specificity and F-score. [Dataset]. http://doi.org/10.1371/journal.pcbi.1007760.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Solomon Shiferaw Beyene; Tianyi Ling; Blagoj Ristevski; Ming Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This parameters were used for Naïve Bayes(NB), Multilayer Perceptron(MLP), Random Forest(RF), Gradient Boosting(GB), Support Vector Machine(SVM) and K-Nearest Neighbors(KNN) algorithms evaluation when applied on the imbalanced sequences. The color trend of F-score from blue to red indicates performance from the best to the poorest. Accuracy, sensitivity, specificity, and F-score are represented in the table as Acc, Sen, Spec, and F-sco, respectively.

  12. Dataset: The effects of class balance on the training energy consumption of...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Gutierrez; Maria Gutierrez; Coral Calero; Coral Calero; Félix García; Félix García; Mª Ángeles Moraga; Mª Ángeles Moraga (2024). Dataset: The effects of class balance on the training energy consumption of logistic regression models [Dataset]. http://doi.org/10.5281/zenodo.10823624
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maria Gutierrez; Maria Gutierrez; Coral Calero; Coral Calero; Félix García; Félix García; Mª Ángeles Moraga; Mª Ángeles Moraga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2024
    Description

    Two synthetic datasets for binary classification, generated with the Random Radial Basis Function generator from WEKA. They are the same shape and size (104.952 instances, 185 attributes), but the "balanced" dataset has 52,13% of its instances belonging to class c0, while the "unbalanced" one only has 4,04% of its instances belonging to class c0. Therefore, this set of datasets is primarily meant to study how class balance influences the behaviour of a machine learning model.

  13. f

    The performance of different machine learning techniques-based stage...

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harpreet Kaur; Sherry Bhalla; Gajendra P. S. Raghava (2023). The performance of different machine learning techniques-based stage classification models developed using 21 methylation CpG sites selected by WEKA (LS-CPG-WEKA). [Dataset]. http://doi.org/10.1371/journal.pone.0221476.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Harpreet Kaur; Sherry Bhalla; Gajendra P. S. Raghava
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The performance of different machine learning techniques-based stage classification models developed using 21 methylation CpG sites selected by WEKA (LS-CPG-WEKA).

  14. Data from: Stable psychological traits predict perceived stress related to...

    • zenodo.org
    • researchdata.cab.unipd.it
    bin, pdf
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Merylin Monaro; Merylin Monaro; Luca Flesia; Valentina Fietta; Barbara Segatto; Elena Colicino; Luca Flesia; Valentina Fietta; Barbara Segatto; Elena Colicino (2024). Stable psychological traits predict perceived stress related to the COVID-19 outbreak [Dataset]. http://doi.org/10.5281/zenodo.3753552
    Explore at:
    pdf, binAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Merylin Monaro; Merylin Monaro; Luca Flesia; Valentina Fietta; Barbara Segatto; Elena Colicino; Luca Flesia; Valentina Fietta; Barbara Segatto; Elena Colicino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the raw dataset associated to the scientific article "Stable psychological traits predict psychological perceived stress to COVID-19 outbreak”, by L. Flesia, V. Fietta, B. Segatto, M. Monaro. Data are contained in the excel file and organized as follows:

    - the entire dataset used by the authors to perform statistical analysis

    - the training set used by the authors to train and validate ML models

    - the test set used by the authors to test the ML models

    The "Legend" file contains the description of each variable in the excel file.

    The step by step instructions to replicate the results of ML classification models, which are reported in the paper, including two .arff files containing the training and test set od data that can be directly run in WEKA software 3.9.

    The "COVID-19 QUESTIONNAIRE" file contains the English version of the questions administered to participants.

  15. f

    The performance of stage classification models developed using 30 RNA...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harpreet Kaur; Sherry Bhalla; Gajendra P. S. Raghava (2023). The performance of stage classification models developed using 30 RNA transcripts selected using WEKA from 103 RNA transcripts (LS-RNA-WEKA). [Dataset]. http://doi.org/10.1371/journal.pone.0221476.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Harpreet Kaur; Sherry Bhalla; Gajendra P. S. Raghava
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The performance of stage classification models developed using 30 RNA transcripts selected using WEKA from 103 RNA transcripts (LS-RNA-WEKA).

  16. E-Commerce Product Reviews - Dataset for ML

    • kaggle.com
    zip
    Updated Dec 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Gözükara (2021). E-Commerce Product Reviews - Dataset for ML [Dataset]. https://www.kaggle.com/furkangozukara/turkish-product-reviews
    Explore at:
    zip(580369522 bytes)Available download formats
    Dataset updated
    Dec 16, 2021
    Authors
    Furkan Gözükara
    Description

    -> If you use Turkish_Product_Reviews_by_Gozukara_and_Ozel_2016 dataset please cite: https://dergipark.org.tr/en/pub/cukurovaummfd/issue/28708/310341

    @research article { cukurovaummfd310341, journal = {Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi}, issn = {1019-1011}, eissn = {2564-7520}, address = {Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi Yayın Kurulu Başkanlığı 01330 ADANA}, publisher = {Cukurova University}, year = {2016}, volume = {31}, pages = {464 - 482}, doi = {10.21605/cukurovaummfd.310341}, title = {Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme}, key = {cite}, author = {Gözükara, Furkan and Özel, Selma Ayşe} }

    https://doi.org/10.21605/cukurovaummfd.310341

    -> Turkish_Product_Reviews_by_Gozukara_and_Ozel_2016 dataset is composed as below: ->-> Top 50 E-commerce sites in Turkey are crawled and their comments are extracted. Then randomly 2000 comments selected and manually labelled by a field expert. ->-> After manual labeling the selected comments is done, 600 negative and 600 positive comments are left. ->-> This dataset contains these comments.

    -> English_Movie_Reviews_by_Pang_and_Lee_2004 ->-> Pang, B., Lee, L., 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, In Proceedings of the 42nd annual meeting on Association for Computational Linguistics (p. 271). ->-> Source: https://www.cs.cornell.edu/people/pabo/movie-review-data/ | polarity dataset v2.0 - review_polarity.tar.gz

    -> English_Movie_Reviews_Sentences_by_Pang_and_Lee_2005 ->-> Pang, B., Lee, L., 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115-124), Association for Computational Linguistics ->-> Source: https://www.cs.cornell.edu/people/pabo/movie-review-data/ | sentence polarity dataset v1.0 - rt-polaritydata.tar.gz

    -> English_Product_Reviews_by_Blitzer_et_al_2007 ->-> Article of the dataset: Blitzer, J., Dredze, M., Pereira, F., 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, In ACL (Vol. 7, pp. 440-447). ->-> Source: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/ | processed_acl.tar.gz

    -> Turkish_Movie_Reviews_by_Demirtas_and_Pechenizkiy_2013 ->-> Demirtas, E., Pechenizkiy, M., 2013. Cross-lingual polarity detection with machine translation, In Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining (p. 9). ACM. ->-> http://www.win.tue.nl/~mpechen/projects/smm/#Datasets Turkish_Movie_Sentiment.zip

    -> The dataset files are provided as used in the article. -> Weka files are generated with Raw Frequency of terms rather than used Weighting Schemes

    -> The folder Cross_Validation contains 10-fold cross-validation each fold files. -> Inside Cross_Validation folder, each turn of the cross-validation is named as test_X where X is the turn number -> Inside test_X folder * Test_Set_Negative_RAW: Contains raw negative class Test data of that cross-validation turn * Test_Set_Negative_Processed: Contains pre-processed negative class Test data of that cross-validation turn * Test_Set_Positive_RAW: Contains raw positive class Test data of that cross-validation turn * Test_Set_Positive_Processed: Contains pre-processed positive class Test data of that cross-validation turn * Train_Set_Negative_RAW: Contains raw negative class Train data of that cross-validation turn * Train_Set_Negative_Processed: Contains pre-processed negative class Train data of that cross-validation turn * Train_Set_Positive_RAW: Contains raw positive class Train data of that cross-validation turn * Train_Set_Positive_Processed: Contains pre-processed positive class Train data of that cross-validation turn * Train_Set_For_Weka: Contains processed Train set formatted for Weka * Test_Set_For_Weka: Contains processed Test set formatted for Weka

    -> The folder Entire_Dataset contains files for Entire Dataset * Negative_Processed: Contains all negative comments processed data * Positive_Processed: Contains all positive comments processed data * Negative_RAW: Contains all negative comments RAW data * Positive_RAW: Contains all positive comments RAW data * Entire_Dataset_WEKA: Contains all documents processed data in WEKA format

  17. f

    Data from: Machine learning approaches in MALDI-MSI: clinical applications

    • tandf.figshare.com
    pptx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Galli; Italo Zoppis; Andrew Smith; Fulvio Magni; Giancarlo Mauri (2023). Machine learning approaches in MALDI-MSI: clinical applications [Dataset]. http://doi.org/10.6084/m9.figshare.3458753.v1
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Manuel Galli; Italo Zoppis; Andrew Smith; Fulvio Magni; Giancarlo Mauri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: Despite the unquestionable advantages of Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging in visualizing the spatial distribution and the relative abundance of biomolecules directly on-tissue, the yielded data is complex and high dimensional. Therefore, analysis and interpretation of this huge amount of information is mathematically, statistically and computationally challenging. Areas covered: This article reviews some of the challenges in data elaboration with particular emphasis on machine learning techniques employed in clinical applications, and can be useful in general as an entry point for those who want to study the computational aspects. Several characteristics of data processing are described, enlightening advantages and disadvantages. Different approaches for data elaboration focused on clinical applications are also provided. Practical tutorial based upon Orange Canvas and Weka software is included, helping familiarization with the data processing. Expert commentary: Recently, MALDI-MSI has gained considerable attention and has been employed for research and diagnostic purposes, with successful results. Data dimensionality constitutes an important issue and statistical methods for information-preserving data reduction represent one of the most challenging aspects. The most common data reduction methods are characterized by collecting independent observations into a single table. However, the incorporation of relational information can improve the discriminatory capability of the data.

  18. Clustered k-mers from S1 Fig used for validation of their biological...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Solomon Shiferaw Beyene; Tianyi Ling; Blagoj Ristevski; Ming Chen (2023). Clustered k-mers from S1 Fig used for validation of their biological function and reported riboswitch motifs. [Dataset]. http://doi.org/10.1371/journal.pcbi.1007760.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Solomon Shiferaw Beyene; Tianyi Ling; Blagoj Ristevski; Ming Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nucleotide location designated refers to match with their position reported in reference.

  19. f

    Data from: A Proposed Churn Prediction Model

    • figshare.com
    pdf
    Updated Feb 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr (2019). A Proposed Churn Prediction Model [Dataset]. http://doi.org/10.6084/m9.figshare.7763183.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 24, 2019
    Dataset provided by
    figshare
    Authors
    Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Churn prediction aims to detect customers intended to leave a service provider. Retaining one customer costs an organization from 5 to 10 times than gaining a new one. Predictive models can provide correct identification of possible churners in the near future in order to provide a retention solution. This paper presents a new prediction model based on Data Mining (DM) techniques. The proposed model is composed of six steps which are; identify problem domain, data selection, investigate data set, classification, clustering and knowledge usage. A data set with 23 attributes and 5000 instances is used. 4000 instances used for training the model and 1000 instances used as a testing set. The predicted churners are clustered into 3 categories in case of using in a retention strategy. The data mining techniques used in this paper are Decision Tree, Support Vector Machine and Neural Network throughout an open source software name WEKA.

  20. Obesity DataSet UCI ML

    • kaggle.com
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tathagat Banerjee (2022). Obesity DataSet UCI ML [Dataset]. https://www.kaggle.com/datasets/tathagatbanerjee/obesity-dataset-uci-ml
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tathagat Banerjee
    Description

    Estimation of obesity levels based on eating habits and physical condition Data Set Download: Data Folder, Data Set Description

    Abstract: This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition.

    Data Set Characteristics:

    Multivariate

    Number of Instances:

    2111

    Area:

    Life

    Attribute Characteristics:

    Integer

    Number of Attributes:

    17

    Date Donated

    2019-08-27

    Associated Tasks:

    Classification, Regression, Clustering

    Missing Values?

    N/A

    Number of Web Hits:

    70843

    Source:

    Fabio Mendoza Palechor, Email: fmendoza1 '@' cuc.edu.co, Celphone: +573182929611 Alexis de la Hoz Manotas, Email: akdelahoz '@' gmail.com, Celphone: +573017756983

    Data Set Information:

    This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. The data contains 17 attributes and 2111 records, the records are labeled with the class variable NObesity (Obesity Level), that allows classification of the data using the values of Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II and Obesity Type III. 77% of the data was generated synthetically using the Weka tool and the SMOTE filter, 23% of the data was collected directly from users through a web platform.

    Attribute Information:

    Read the article ([Web Link]) to see the description of the attributes.

    Relevant Papers:

    [1]Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344. [2]De-La-Hoz-Correa, E., Mendoza Palechor, F., De-La-Hoz-Manotas, A., Morales Ortega, R., & Sánchez Hernández, A. B. (2019). Obesity level estimation software based on decision trees.

    Citation Request:

    [1] Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Minh-Son-Phan (2024). Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5978939

Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2

Explore at:
Dataset updated
Jul 17, 2024
Dataset provided by
Jean-Yves Tinevez
Minh-Son-Phan
Guillaume Jacquemet
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This folder contains data used to illustrate the utility of Weka detector in TrackMate.

  • classifier.model: trained Weka classifier.
  • image data: human dermal microvascular blood endothelial cells expressing GFP-paxillin

More detail on using these files can be found here: https://imagej.net/plugins/trackmate/trackmate-weka.

Search
Clear search
Close search
Google apps
Main menu