30 datasets found

Z
Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2
data.niaid.nih.gov
zenodo.org
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minh-Son-Phan (2024). Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5978939
Explore at:
Dataset updated
Jul 17, 2024
Dataset provided by
Jean-Yves Tinevez
Minh-Son-Phan
Guillaume Jacquemet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This folder contains data used to illustrate the utility of Weka detector in TrackMate.

classifier.model: trained Weka classifier.

image data: human dermal microvascular blood endothelial cells expressing GFP-paxillin

More detail on using these files can be found here: https://imagej.net/plugins/trackmate/trackmate-weka.
Data from: COVID-19 and media dataset: Mining textual data according periods...
dataverse.cirad.fr
application/x-gzip +1
Updated Dec 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathieu Roche; Mathieu Roche (2020). COVID-19 and media dataset: Mining textual data according periods and countries (UK, Spain, France) [Dataset]. http://doi.org/10.18167/DVN1/ZUA8MF
Explore at:
application/x-gzip(511157), application/x-gzip(97349), text/x-perl-script(4982), application/x-gzip(93110), application/x-gzip(23765310), application/x-gzip(107669)Available download formats
Unique identifier
https://doi.org/10.18167/DVN1/ZUA8MF
Dataset updated
Dec 21, 2020
Dataset provided by
Centre de coopération internationale en recherche agronomique pour le développementhttps://www.cirad.fr/
Authors
Mathieu Roche; Mathieu Roche
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spain, United Kingdom, France
Dataset funded by
ANR (#DigitAg)
Horizon 2020 - European Commission - (MOOD project)
Description
These datasets contain a set of news articles in English, French and Spanish extracted from Medisys (i‧e. advanced search) according the following criteria: (1) Keywords (at least): COVID-19, ncov2019, cov2019, coronavirus; (2) Keywords (all words): masque (French), mask (English), máscara (Spanish) (3) Periods: March 2020, May 2020, July 2020; (4) Countries: UK (English), Spain (Spanish), France (French). A corpus by country has been manually collected (copy/paste) from Medisys. For each country, 100 snippets by period (the 1st, 10th, 15th, 20th for each month) are built. The datasets are composed of: (1) A corpus preprocessed for the BioTex tool - https://gitlab.irstea.fr/jacques.fize/biotex_python (.txt) [~ 900 texts]; (2) The same corpus preprocessed for the Weka tool - https://www.cs.waikato.ac.nz/ml/weka/ (.arff); (3) Terms extracted with BioTex according spatio-temporal criteria (*.csv) [~ 9000 terms]. Other corpora can be collected with this same method. The code in Perl in order to preprocess textual data for terminology extraction (with BioTex) and classification (with Weka) tasks is available. A new version of this dataset (December 2020) includes additional data: - Python preprocessing and BioTex code [Execution_BioTex‧tgz]. - Terms extracted with different ranking measures (i‧e. C-Value, F-TFIDF-C_M) and methods (i‧e. extraction of words and multi-word terms) with the online version of BioTex [Terminology_with_BioTex_online_dec2020.tgz],
Tracking focal adhesions with TrackMate and Weka - tutorial dataset 1
zenodo.org
data.niaid.nih.gov
bin, png
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jean-Yves Tinevez; Jean-Yves Tinevez; Minh-Son-Phan; Minh-Son-Phan; Guillaume Jacquemet; Guillaume Jacquemet (2024). Tracking focal adhesions with TrackMate and Weka - tutorial dataset 1 [Dataset]. http://doi.org/10.5281/zenodo.5226842
Explore at:
bin, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5226842
Dataset updated
Jul 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jean-Yves Tinevez; Jean-Yves Tinevez; Minh-Son-Phan; Minh-Son-Phan; Guillaume Jacquemet; Guillaume Jacquemet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This folder contains data used to illustrate the utility of Weka detector in TrackMate.

- classifier.model: trained Weka classifier.
- MDA231 paxillin DMSO 1 min.czi - MDA231 paxillin DMSO 1 min.czi #01_t1_t40_crop.tif: example image.

More detail on using these files can be found here: https://imagej.net/plugins/trackmate/trackmate-weka.
I
Words_Selected_by_Information_Gain
databank.illinois.edu
Updated Jan 2, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaoru Dong; Jingyi Xie; Linh Hoang (2019). Words_Selected_by_Information_Gain [Dataset]. http://doi.org/10.13012/B2IDB-9837167_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-9837167_V1
Dataset updated
Jan 2, 2019
Authors
Xiaoru Dong; Jingyi Xie; Linh Hoang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
U.S. National Institutes of Health (NIH)
Description
File Name: WordsSelectedByInformationGain.csv Data Preparation: Xiaoru Dong, Linh Hoang Date of Preparation: 2018-12-12 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: the file contains a list of 1655 informative words selected by applying information gain feature selection strategy. Information gain is one of the methods commonly used for feature selection, which tells us how many bits of information the presence of the word are helpful for us to predict the classes, and can be computed in a specific formula [Jurafsky D, Martin JH. Speech and language processing. London: Pearson; 2014 Dec 30].We ran Information Gain feature selection on Weka -- a machine learning tool. Notes: In order to reproduce the data in this file, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
Z
Supporting datasets PubFig05 for: "Heterogeneous Ensemble Combination Search...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haque, Mohammad Nazmul (2020). Supporting datasets PubFig05 for: "Heterogeneous Ensemble Combination Search using Genetic Algorithm for Class Imbalanced Data Classification" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_33539
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Noman, Nasimul
Haque, Mohammad Nazmul
Berratta, Regina
Moscato, Pablo
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Faces Dataset: PubFig05

This is a subset of the ''PubFig83'' dataset [1] which provides 100 images each of 5 most difficult celebrities to recognise (referred as class in the classification problem). For each celebrity persons, we took 100 images and separated them into training and testing sets of 90 and 10 images, respectively:

Person: Jenifer Lopez; Katherine Heigl; Scarlett Johansson; Mariah Carey; Jessica Alba

Feature Extraction

To extract features from images, we have applied the HT-L3-model as described in [2] and obtained 25600 features.

Feature Selection

Details about feature selection followed in brief as follows:

Entropy Filtering: First we apply an implementation of Fayyad and Irani's [3] entropy base heuristic to discretise the dataset and discarded features using the minimum description length (MDL) principle and only 4878 passed this entropy based filtering method.

Class-Distribution Balancing: Next, we have converted the dataset to binary-class problem by separating into 5 binary-class datasets using one-vs-all setup. Hence, these datasets became imbalanced at a ratio of 1:4. Then we converted them into balanced binary-class datasets using random sub-sampled method. Further processing of the dataset has been described in the paper.

(alpha,beta)-k Feature selection: To get a good feature set for training the classifier, we select the features using the approach based on the (alpha,beta)-k feature selection [4] problem. It selects a minimum subset of features that maximise both within class similarity and dissimilarity in different classes. We applied the entropy filtering and (alpha,beta)-k feature subset selection methods in three ways and obtained different numbers of features (in the Table below) after consolidating them into binary class dataset.

UAB: We applied (alpha,beta)-k feature set method on each of the balanced binary-class datasets and we took the union of selected features for each binary-class datasets. Finally, we applied the (alpha,beta)-k feature set selection method on each of the binary-class datasets and get a set of features.

IAB: We applied (alpha,beta)-k feature set method on each of the balanced binary-class datasets and we took the intersection of selected features for each binary-class datasets. Finally, we applied the (alpha,beta)-k feature set selection method on each of the binary-class datasets and get a set of features.

UEAB: We applied (alpha,beta)-k feature set method on each of the balanced binary-class datasets. Then, we applied the entropy filtering and (alpha,beta)-k feature set selection method on each of the balanced binary-class datasets. Finally, we took the union of selected features for each balanced binary-class datasets and get a set of features.

All of these datasets are inside the compressed folder. It also contains the document describing the process detail.

References

[1] Pinto, N., Stone, Z., Zickler, T., & Cox, D. (2011). Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on (pp. 35–42).

[2] Cox, D., & Pinto, N. (2011). Beyond simple features: A large-scale feature search approach to unconstrained face recognition. In Automatic Face Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on (pp. 8–15).

[3] Fayyad, U. M., & Irani, K. B. (1993). Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In International Joint Conference on Artificial Intelligence (pp. 1022–1029).

[4] Berretta, R., Mendes, A., & Moscato, P. (2005). Integer programming models and algorithms for molecular classification of cancer from microarray data. In Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38 (pp. 361–370). 1082201: Australian Computer Society, Inc.
Z
Data from: Machine Learning Models and New Computational Tool for the...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jun 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martinez-Rios (2022). Machine Learning Models and New Computational Tool for the Discovery of Insect Repellents that Interfere with Olfaction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6677764
Explore at:
Dataset updated
Jun 22, 2022
Dataset provided by
Garcia-Jacas
Marrero-Ponce
Pulgar-Sánchez
Martinez-Rios
Hernández-Lambraño
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SI1_Supporting Information file (docx) brings together detailed information on the outstanding models obtained for each dataset analyzed in this study such as statistical and training parameters and outliers. There can be found the responses in spikes/s of the mosquito Culex quinquefasciatus to the 50 IRs. Besides, there is presented a full table of the up-to-date studies related to QSAR and insect repellency.

SI2_EXP1_50IRs from Liu et al (2013) SDF file presents the structures of each of the 50 IRs analyzed.

SI3_EXP2_Datasets gathers the four datasets as SDF files from Oliferenko et al. (2013), Gaudin et al. (2008), Omolo et al. (2004), and Paluch et al. (2009) used for the repellency modeling in EXP2.

SI4_EXP3_Prospective analysis provides Malaria Box Library (400 compounds) as an SDF file, which were analyzed in our virtual screening to prospect potential virtual hits.

SI5_QuBiLS-MIDAS MDs lists contain three TXT lists of 3D molecular descriptors used in QuBiLS-MIDAS to describe the molecules used in the present study.

SI6_EXP1_Sensillar Modeling comprises two subfolders: Classification and Regression models for each of the six sensilla. Models built to predict the physiological interaction experimentally obtained from Liu et al. (2013). All of the models are implemented in the software SiLiS-PAPACS. Every single folder compiles a DOCX file with the detailed description of the model, an XLSX file with the output obtained from the training in Weka 3.9.4, an ARFF, and CSV files with the MDs for each molecule, and the SDF of the study dataset.

SI7_EXP2_Repellency Modeling encompasses the four datasets in the study: Oliferenko et al. (2013), Gaudin et al. (2008), Omolo et al. (2004), and Paluch et al. (2009). Inside the subfolders, there are three models per type of MDs (duplex, triple, generic, and mix) selected that best predict each dataset. As well as the SI6 folder, each model includes six files: DOCX, XLSX, ARFF, CSV, and an SDF.

SI8_Virtual Hits includes the cluster analysis results and physico-chemical properties of new IR virtual leads.
Classifier result.
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo (2023). Classifier result. [Dataset]. http://doi.org/10.1371/journal.pone.0241701.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0241701.t002
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classifier result.
Experiments results.
plos.figshare.com
xls
Updated Jun 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo (2023). Experiments results. [Dataset]. http://doi.org/10.1371/journal.pone.0241701.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0241701.t004
Dataset updated
Jun 14, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Experiments results.
f
LDA attributes.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo (2023). LDA attributes. [Dataset]. http://doi.org/10.1371/journal.pone.0241701.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0241701.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LDA attributes.
f
Best parameter values.
plos.figshare.com
xls
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo (2023). Best parameter values. [Dataset]. http://doi.org/10.1371/journal.pone.0241701.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0241701.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
P. Celard; A. Seara Vieira; E. L. Iglesias; L. Borrajo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Best parameter values.
f
Accuracy, sensitivity, specificity and F-score.
figshare.com
xls
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Solomon Shiferaw Beyene; Tianyi Ling; Blagoj Ristevski; Ming Chen (2023). Accuracy, sensitivity, specificity and F-score. [Dataset]. http://doi.org/10.1371/journal.pcbi.1007760.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007760.t001
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS Computational Biology
Authors
Solomon Shiferaw Beyene; Tianyi Ling; Blagoj Ristevski; Ming Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This parameters were used for Naïve Bayes(NB), Multilayer Perceptron(MLP), Random Forest(RF), Gradient Boosting(GB), Support Vector Machine(SVM) and K-Nearest Neighbors(KNN) algorithms evaluation when applied on the imbalanced sequences. The color trend of F-score from blue to red indicates performance from the best to the poorest. Accuracy, sensitivity, specificity, and F-score are represented in the table as Acc, Sen, Spec, and F-sco, respectively.
Dataset: The effects of class balance on the training energy consumption of...
zenodo.org
data.niaid.nih.gov
csv
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Gutierrez; Maria Gutierrez; Coral Calero; Coral Calero; Félix García; Félix García; Mª Ángeles Moraga; Mª Ángeles Moraga (2024). Dataset: The effects of class balance on the training energy consumption of logistic regression models [Dataset]. http://doi.org/10.5281/zenodo.10823624
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10823624
Dataset updated
Mar 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Gutierrez; Maria Gutierrez; Coral Calero; Coral Calero; Félix García; Félix García; Mª Ángeles Moraga; Mª Ángeles Moraga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2024
Description
Two synthetic datasets for binary classification, generated with the Random Radial Basis Function generator from WEKA. They are the same shape and size (104.952 instances, 185 attributes), but the "balanced" dataset has 52,13% of its instances belonging to class c0, while the "unbalanced" one only has 4,04% of its instances belonging to class c0. Therefore, this set of datasets is primarily meant to study how class balance influences the behaviour of a machine learning model.
f
The performance of different machine learning techniques-based stage...
plos.figshare.com
xls
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harpreet Kaur; Sherry Bhalla; Gajendra P. S. Raghava (2023). The performance of different machine learning techniques-based stage classification models developed using 21 methylation CpG sites selected by WEKA (LS-CPG-WEKA). [Dataset]. http://doi.org/10.1371/journal.pone.0221476.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0221476.t001
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS ONE
Authors
Harpreet Kaur; Sherry Bhalla; Gajendra P. S. Raghava
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The performance of different machine learning techniques-based stage classification models developed using 21 methylation CpG sites selected by WEKA (LS-CPG-WEKA).
Data from: Stable psychological traits predict perceived stress related to...
zenodo.org
researchdata.cab.unipd.it
bin, pdf
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Merylin Monaro; Merylin Monaro; Luca Flesia; Valentina Fietta; Barbara Segatto; Elena Colicino; Luca Flesia; Valentina Fietta; Barbara Segatto; Elena Colicino (2024). Stable psychological traits predict perceived stress related to the COVID-19 outbreak [Dataset]. http://doi.org/10.5281/zenodo.3753552
Explore at:
pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3753552
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Merylin Monaro; Merylin Monaro; Luca Flesia; Valentina Fietta; Barbara Segatto; Elena Colicino; Luca Flesia; Valentina Fietta; Barbara Segatto; Elena Colicino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the raw dataset associated to the scientific article "Stable psychological traits predict psychological perceived stress to COVID-19 outbreak”, by L. Flesia, V. Fietta, B. Segatto, M. Monaro. Data are contained in the excel file and organized as follows:

- the entire dataset used by the authors to perform statistical analysis

- the training set used by the authors to train and validate ML models

- the test set used by the authors to test the ML models

The "Legend" file contains the description of each variable in the excel file.

The step by step instructions to replicate the results of ML classification models, which are reported in the paper, including two .arff files containing the training and test set od data that can be directly run in WEKA software 3.9.

The "COVID-19 QUESTIONNAIRE" file contains the English version of the questions administered to participants.
f
The performance of stage classification models developed using 30 RNA...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harpreet Kaur; Sherry Bhalla; Gajendra P. S. Raghava (2023). The performance of stage classification models developed using 30 RNA transcripts selected using WEKA from 103 RNA transcripts (LS-RNA-WEKA). [Dataset]. http://doi.org/10.1371/journal.pone.0221476.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0221476.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Harpreet Kaur; Sherry Bhalla; Gajendra P. S. Raghava
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The performance of stage classification models developed using 30 RNA transcripts selected using WEKA from 103 RNA transcripts (LS-RNA-WEKA).
E-Commerce Product Reviews - Dataset for ML
kaggle.com
zip
Updated Dec 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Furkan Gözükara (2021). E-Commerce Product Reviews - Dataset for ML [Dataset]. https://www.kaggle.com/furkangozukara/turkish-product-reviews
Explore at:
zip(580369522 bytes)Available download formats
Dataset updated
Dec 16, 2021
Authors
Furkan Gözükara
Description
-> If you use Turkish_Product_Reviews_by_Gozukara_and_Ozel_2016 dataset please cite: https://dergipark.org.tr/en/pub/cukurovaummfd/issue/28708/310341

@research article { cukurovaummfd310341, journal = {Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi}, issn = {1019-1011}, eissn = {2564-7520}, address = {Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi Yayın Kurulu Başkanlığı 01330 ADANA}, publisher = {Cukurova University}, year = {2016}, volume = {31}, pages = {464 - 482}, doi = {10.21605/cukurovaummfd.310341}, title = {Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme}, key = {cite}, author = {Gözükara, Furkan and Özel, Selma Ayşe} }

https://doi.org/10.21605/cukurovaummfd.310341

-> Turkish_Product_Reviews_by_Gozukara_and_Ozel_2016 dataset is composed as below: ->-> Top 50 E-commerce sites in Turkey are crawled and their comments are extracted. Then randomly 2000 comments selected and manually labelled by a field expert. ->-> After manual labeling the selected comments is done, 600 negative and 600 positive comments are left. ->-> This dataset contains these comments.

-> English_Movie_Reviews_by_Pang_and_Lee_2004 ->-> Pang, B., Lee, L., 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, In Proceedings of the 42nd annual meeting on Association for Computational Linguistics (p. 271). ->-> Source: https://www.cs.cornell.edu/people/pabo/movie-review-data/ | polarity dataset v2.0 - review_polarity.tar.gz

-> English_Movie_Reviews_Sentences_by_Pang_and_Lee_2005 ->-> Pang, B., Lee, L., 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115-124), Association for Computational Linguistics ->-> Source: https://www.cs.cornell.edu/people/pabo/movie-review-data/ | sentence polarity dataset v1.0 - rt-polaritydata.tar.gz

-> English_Product_Reviews_by_Blitzer_et_al_2007 ->-> Article of the dataset: Blitzer, J., Dredze, M., Pereira, F., 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, In ACL (Vol. 7, pp. 440-447). ->-> Source: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/ | processed_acl.tar.gz

-> Turkish_Movie_Reviews_by_Demirtas_and_Pechenizkiy_2013 ->-> Demirtas, E., Pechenizkiy, M., 2013. Cross-lingual polarity detection with machine translation, In Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining (p. 9). ACM. ->-> http://www.win.tue.nl/~mpechen/projects/smm/#Datasets Turkish_Movie_Sentiment.zip

-> The dataset files are provided as used in the article. -> Weka files are generated with Raw Frequency of terms rather than used Weighting Schemes

-> The folder Cross_Validation contains 10-fold cross-validation each fold files. -> Inside Cross_Validation folder, each turn of the cross-validation is named as test_X where X is the turn number -> Inside test_X folder * Test_Set_Negative_RAW: Contains raw negative class Test data of that cross-validation turn * Test_Set_Negative_Processed: Contains pre-processed negative class Test data of that cross-validation turn * Test_Set_Positive_RAW: Contains raw positive class Test data of that cross-validation turn * Test_Set_Positive_Processed: Contains pre-processed positive class Test data of that cross-validation turn * Train_Set_Negative_RAW: Contains raw negative class Train data of that cross-validation turn * Train_Set_Negative_Processed: Contains pre-processed negative class Train data of that cross-validation turn * Train_Set_Positive_RAW: Contains raw positive class Train data of that cross-validation turn * Train_Set_Positive_Processed: Contains pre-processed positive class Train data of that cross-validation turn * Train_Set_For_Weka: Contains processed Train set formatted for Weka * Test_Set_For_Weka: Contains processed Test set formatted for Weka

-> The folder Entire_Dataset contains files for Entire Dataset * Negative_Processed: Contains all negative comments processed data * Positive_Processed: Contains all positive comments processed data * Negative_RAW: Contains all negative comments RAW data * Positive_RAW: Contains all positive comments RAW data * Entire_Dataset_WEKA: Contains all documents processed data in WEKA format
f
Data from: Machine learning approaches in MALDI-MSI: clinical applications
tandf.figshare.com
pptx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Galli; Italo Zoppis; Andrew Smith; Fulvio Magni; Giancarlo Mauri (2023). Machine learning approaches in MALDI-MSI: clinical applications [Dataset]. http://doi.org/10.6084/m9.figshare.3458753.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3458753.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Manuel Galli; Italo Zoppis; Andrew Smith; Fulvio Magni; Giancarlo Mauri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction: Despite the unquestionable advantages of Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging in visualizing the spatial distribution and the relative abundance of biomolecules directly on-tissue, the yielded data is complex and high dimensional. Therefore, analysis and interpretation of this huge amount of information is mathematically, statistically and computationally challenging. Areas covered: This article reviews some of the challenges in data elaboration with particular emphasis on machine learning techniques employed in clinical applications, and can be useful in general as an entry point for those who want to study the computational aspects. Several characteristics of data processing are described, enlightening advantages and disadvantages. Different approaches for data elaboration focused on clinical applications are also provided. Practical tutorial based upon Orange Canvas and Weka software is included, helping familiarization with the data processing. Expert commentary: Recently, MALDI-MSI has gained considerable attention and has been employed for research and diagnostic purposes, with successful results. Data dimensionality constitutes an important issue and statistical methods for information-preserving data reduction represent one of the most challenging aspects. The most common data reduction methods are characterized by collecting independent observations into a single table. However, the incorporation of relational information can improve the discriminatory capability of the data.
Clustered k-mers from S1 Fig used for validation of their biological...
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Solomon Shiferaw Beyene; Tianyi Ling; Blagoj Ristevski; Ming Chen (2023). Clustered k-mers from S1 Fig used for validation of their biological function and reported riboswitch motifs. [Dataset]. http://doi.org/10.1371/journal.pcbi.1007760.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007760.t003
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Solomon Shiferaw Beyene; Tianyi Ling; Blagoj Ristevski; Ming Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nucleotide location designated refers to match with their position reported in reference.
f
Data from: A Proposed Churn Prediction Model
figshare.com
pdf
Updated Feb 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr (2019). A Proposed Churn Prediction Model [Dataset]. http://doi.org/10.6084/m9.figshare.7763183.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7763183.v2
Dataset updated
Feb 24, 2019
Dataset provided by
figshare
Authors
Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Churn prediction aims to detect customers intended to leave a service provider. Retaining one customer costs an organization from 5 to 10 times than gaining a new one. Predictive models can provide correct identification of possible churners in the near future in order to provide a retention solution. This paper presents a new prediction model based on Data Mining (DM) techniques. The proposed model is composed of six steps which are; identify problem domain, data selection, investigate data set, classification, clustering and knowledge usage. A data set with 23 attributes and 5000 instances is used. 4000 instances used for training the model and 1000 instances used as a testing set. The predicted churners are clustered into 3 categories in case of using in a retention strategy. The data mining techniques used in this paper are Decision Tree, Support Vector Machine and Neural Network throughout an open source software name WEKA.
Obesity DataSet UCI ML
kaggle.com
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tathagat Banerjee (2022). Obesity DataSet UCI ML [Dataset]. https://www.kaggle.com/datasets/tathagatbanerjee/obesity-dataset-uci-ml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 23, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tathagat Banerjee
Description
Estimation of obesity levels based on eating habits and physical condition Data Set Download: Data Folder, Data Set Description

Abstract: This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition.

Data Set Characteristics:

Multivariate

Number of Instances:

2111

Area:

Life

Attribute Characteristics:

Integer

Number of Attributes:

17

Date Donated

2019-08-27

Associated Tasks:

Classification, Regression, Clustering

Missing Values?

N/A

Number of Web Hits:

70843

Source:

Fabio Mendoza Palechor, Email: fmendoza1 '@' cuc.edu.co, Celphone: +573182929611 Alexis de la Hoz Manotas, Email: akdelahoz '@' gmail.com, Celphone: +573017756983

Data Set Information:

This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. The data contains 17 attributes and 2111 records, the records are labeled with the class variable NObesity (Obesity Level), that allows classification of the data using the values of Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II and Obesity Type III. 77% of the data was generated synthetically using the Weka tool and the SMOTE filter, 23% of the data was collected directly from users through a web platform.

Attribute Information:

Read the article ([Web Link]) to see the description of the attributes.

Relevant Papers:

[1]Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344. [2]De-La-Hoz-Correa, E., Mendoza Palechor, F., De-La-Hoz-Manotas, A., Morales Ortega, R., & SÃ¡nchez HernÃ¡ndez, A. B. (2019). Obesity level estimation software based on decision trees.

Citation Request:

[1] Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344.

Facebook

Twitter

Click to copy link

Link copied

Cite

Minh-Son-Phan (2024). Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5978939

Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2

Explore at:

Dataset updated

Jul 17, 2024

Dataset provided by

Jean-Yves Tinevez
Minh-Son-Phan
Guillaume Jacquemet

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This folder contains data used to illustrate the utility of Weka detector in TrackMate.

classifier.model: trained Weka classifier.
image data: human dermal microvascular blood endothelial cells expressing GFP-paxillin

More detail on using these files can be found here: https://imagej.net/plugins/trackmate/trackmate-weka.

Clear search

Close search

Google apps

Main menu

Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2

Data from: COVID-19 and media dataset: Mining textual data according periods...

Tracking focal adhesions with TrackMate and Weka - tutorial dataset 1

Words_Selected_by_Information_Gain

Supporting datasets PubFig05 for: "Heterogeneous Ensemble Combination Search...

Data from: Machine Learning Models and New Computational Tool for the...

Classifier result.

Experiments results.

LDA attributes.

Best parameter values.

Accuracy, sensitivity, specificity and F-score.

Dataset: The effects of class balance on the training energy consumption of...

The performance of different machine learning techniques-based stage...

Data from: Stable psychological traits predict perceived stress related to...

The performance of stage classification models developed using 30 RNA...

E-Commerce Product Reviews - Dataset for ML

Data from: Machine learning approaches in MALDI-MSI: clinical applications

Clustered k-mers from S1 Fig used for validation of their biological...

Data from: A Proposed Churn Prediction Model

Obesity DataSet UCI ML

Source:

Data Set Information:

Attribute Information:

Relevant Papers:

Citation Request:

Tracking focal adhesions with TrackMate and Weka - tutorial dataset 2