This dataset was created by Seol
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Lisette
Released under CC0: Public Domain
Kaggle is a platform for sharing data, performing reproducible analyses, interactive data analysis tutorials, and machine learning competitions.
This dataset was created by Kenneth Chidiebele
This dataset was created by skyhwchoi
This dataset was created by Christoforos Christoforou
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Started out as a pumpkin detector to test training YOLOv5. Now suffering from extensive feature creep and probably ending up as a cat/dog/spider/pumpkin/randomobjects-detector. Or as a desaster.
The dataset does not fit https://docs.ultralytics.com/tutorials/training-tips-best-results/ well. There are no background images and the labeling is often only partial. Especially in the humans and pumpkin category where there are often lots of objects in one photo people apparently (and understandably) got bored and did not labe everything. And of course the images from the cat-category don't have the humans in it labeled since they come from a cat-identification model which ignored humans. It will need a lot of time to fixt that.
Dataset used: - Cat and Dog Data: Cat / Dog Tutorial NVIDIA Jetson https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-cat-dog.md © 2016-2019 NVIDIA according to bottom of linked page - Spider Data: Kaggle Animal 10 image set https://www.kaggle.com/datasets/alessiocorrado99/animals10 Animal pictures of 10 different categories taken from google images Kaggle project licensed GPL 2 - Pumpkin Data: Kaggle "Vegetable Images" https://www.researchgate.net/publication/352846889_DCNN-Based_Vegetable_Image_Classification_Using_Transfer_Learning_A_Comparative_Study https://www.kaggle.com/datasets/misrakahmed/vegetable-image-dataset Kaggle project licensed CC BY-SA 4.0 - Some pumpkin images manually copied from google image search - https://universe.roboflow.com/chess-project/chess-sample-rzbmc Provided by a Roboflow user License: CC BY 4.0 - https://universe.roboflow.com/steve-pamer-cvmbg/pumpkins-gfjw5 Provided by a Roboflow user License: CC BY 4.0 - https://universe.roboflow.com/nbduy/pumpkin-ryavl Provided by a Roboflow user License: CC BY 4.0 - https://universe.roboflow.com/homeworktest-wbx8v/cat_test-1x0bl/dataset/2 - https://universe.roboflow.com/220616nishikura/catdetector - https://universe.roboflow.com/atoany/cats-s4d4i/dataset/2 - https://universe.roboflow.com/personal-vruc2/agricultured-ioth22 - https://universe.roboflow.com/sreyoshiworkspace-radu9/pet_detection - https://universe.roboflow.com/artyom-hystt/my-dogs-lcpqe - license: Public Domain url: https://universe.roboflow.com/dolazy7-gmail-com-3vj05/sweetpumpkin/dataset/2 - https://universe.roboflow.com/tristram-dacayan/social-distancing-g4pbu - https://universe.roboflow.com/fyp-3edkl/social-distancing-2ygx5 License MIT - Spiders: https://universe.roboflow.com/lucas-lins-souza/animals-train-yruka
Currently I can't guarantee it's all correctly licenced. Checks are in progress. Inform me if you see one of your pictures and want it to be removed!
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by BCanOzen
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘US Adult Income’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/johnolafenwa/us-census-data on 30 September 2021.
--- Dataset description provided by original source is as follows ---
US Adult Census data relating income to social factors such as Age, Education, race etc.
The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K".
This Data set is split into two CSV files, named adult-training.txt
and adult-test.txt
.
The goal here is to train a binary classifier on the training dataset to predict the column income_bracket
which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.
Note that the dataset is made up of categorical and continuous features. It also contains missing values The categorical columns are: workclass, education, marital_status, occupation, relationship, race, gender, native_country
The continuous columns are: age, education_num, capital_gain, capital_loss, hours_per_week
This Dataset was obtained from the UCI repository, it can be found on
https://archive.ics.uci.edu/ml/datasets/census+income, http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/
USAGE This dataset is well suited to developing and testing wide linear classifiers, deep neutral network classifiers and a combination of both. For more info on Combined Deep and Wide Model classifiers, refer to the Research Paper by Google https://arxiv.org/abs/1606.07792
Refer to this kernel for sample usage : https://www.kaggle.com/johnolafenwa/wage-prediction
Complete Tutorial is available from http://johnolafenwa.blogspot.com.ng/2017/07/machine-learning-tutorial-1-wage.html?m=1
--- Original source retains full ownership of the source dataset ---
Its contains multiple datasets and selected tutorials for learning purposes.
We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...
This dataset was created by CS3319-02
This dataset was created by Ridho dwi Fachri
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Herein, we present a deep learning technique to remove artifacts automatically in fundus photograph. By using a CycleGAN model, we synthesize the retinal images with artifact reduction based on low-quality image, and validated this technique in the independent test dataset.
This study included total 2,206 anonymized retinal images. We collected the fundus photographs without qualification, which include normal and pathologic retinal images. Images including both photograph with and without artifacts were crawled from Google image and dataset search using English keywords related to retina. The search strategy was based on the key terms “fundus photography”, “retinal image”, “artifact”, “quality assessment”, “retinal image grade”, “diabetic retinopathy”, “age-related macular degeneration”, “glaucoma”, “cataract”, and “fundus dataset”. Images with artifact were manually classified by authors. Finally, 1,146 images with artifacts and 1,060 images without artifacts were collected. The experiment process complied with the Declaration of Helsinki. This study did not require ethics committee approval; instead, researchers used open web-based and deidentified data.
We used the CoLaboratory’s CycleGAN tutorial page to develop and to validate CycleGAN model, and all codes were available in the webpage (https://www.tensorflow.org/tutorials/generative/cyclegan).
**This dataset may include MESSIDOR, HRF, FIRE, DRIVE, Kaggle DMR, and freely available images from Google image search. Images with and without artifacts were categorized to investigate artifact reduction.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Lucas Henrique Mateo
Released under Apache 2.0
This dataset was created by Arpan Dhatt
This dataset was created by Chao CHEN
This dataset was created by Pritam Purohit
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Daniil Barysevich
Released under CC0: Public Domain
This dataset was created by Abhinand
It contains the following files:
This dataset was created by Seol