Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Name: Multilingual Names for RNN Classification
Description:
This dataset, sourced from PyTorch's official tutorial, comprises popular names across 18 distinct languages, namely Arabic, Chinese, Czech, Dutch, English, French, German, Greek, Irish, Italian, Japanese, Korean, Polish, Portuguese, Russian, Scottish, Spanish, and Vietnamese. Each language's names are contained in separate text files for easy extraction and categorization.
Usage:
The dataset is particularly useful for tasks like Recurrent Neural Network (RNN) classification, where the aim might be to predict the language origin of a given name based on its character sequence.
Facebook
TwitterThis dataset was created by Mohamed Abdelkarim
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A pytorch loadable model for tumor classification
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Tikadisplay
Released under Apache 2.0
Facebook
TwitterThis dataset was created by MohamedAmine SAIGHI
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
weights mri classification pytorch
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Softmax output when passing ImageNet-1K data (train & test sets) to PyTorch's pretrained classification models.
✅ AlexNet
v1: {'acc@1': 0.56522, 'acc@5': 0.79066, 'num_params': 61.10M}
✅ DenseNet (121, 161, 169, 201)
v1: {'acc@1': 0.74434, 'acc@5': 0.91972, 'num_params': 7.98M}
v1: {'acc@1': 0.77138, 'acc@5': 0.93560, 'num_params': 28.68M}
v1: {'acc@1': 0.75600, 'acc@5': 0.92806, 'num_params': 14.15M}
v1: {'acc@1': 0.76896, 'acc@5': 0.93370, 'num_params': 20.01M}
✅ VGG (11, 13, 16, 19)
v1: {'acc@1': 0.69020, 'acc@5': 0.88628, 'num_params': 132.86M}
v1: {'acc@1': 0.69928, 'acc@5': 0.89246, 'num_params': 133.05M}
v1: {'acc@1': 0.71592, 'acc@5': 0.90382, 'num_params': 138.36M}
v1: {'acc@1': 0.72376, 'acc@5': 0.90876, 'num_params': 143.67M}
Facebook
Twitterhttps://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-3988https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-3988
This is a PyTorch implementation of the paper Hyperbolic Embedding Inference for Structured Multi-Label Prediction published in NeurIPS 2022. The code provides the Python scripts to reproduce the experiments in the paper, as well as a proof-of-concept example of the method. To execute the code, follow the instructions in the README.md file. For more info, please check the paper. Please have no hesitation to contact the authors for any inquiries.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by AYUSH KUMAR SINGH
Released under MIT
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This seminar is an applied study of deep learning methods for extracting information from geospatial data, such as aerial imagery, multispectral imagery, digital terrain data, and other digital cartographic representations. We first provide an introduction and conceptualization of artificial neural networks (ANNs). Next, we explore appropriate loss and assessment metrics for different use cases followed by the tensor data model, which is central to applying deep learning methods. Convolutional neural networks (CNNs) are then conceptualized with scene classification use cases. Lastly, we explore semantic segmentation, object detection, and instance segmentation. The primary focus of this course is semantic segmenation for pixel-level classification. The associated GitHub repo provides a series of applied examples. We hope to continue to add examples as methods and technologies further develop. These examples make use of a vareity of datasets (e.g., SAT-6, topoDL, Inria, LandCover.ai, vfillDL, and wvlcDL). Please see the repo for links to the data and associated papers. All examples have associated videos that walk through the process, which are also linked to the repo. A variety of deep learning architectures are explored including UNet, UNet++, DeepLabv3+, and Mask R-CNN. Currenlty, two examples use ArcGIS Pro and require no coding. The remaining five examples require coding and make use of PyTorch, Python, and R within the RStudio IDE. It is assumed that you have prior knowledge of coding in the Python and R enviroinments. If you do not have experience coding, please take a look at our Open-Source GIScience and Open-Source Spatial Analytics (R) courses, which explore coding in Python and R, respectively. After completing this seminar you will be able to: explain how ANNs work including weights, bias, activation, and optimization. describe and explain different loss and assessment metrics and determine appropriate use cases. use the tensor data model to represent data as input for deep learning. explain how CNNs work including convolutional operations/layers, kernel size, stride, padding, max pooling, activation, and batch normalization. use PyTorch, Python, and R to prepare data, produce and assess scene classification models, and infer to new data. explain common semantic segmentation architectures and how these methods allow for pixel-level classification and how they are different from traditional CNNs. use PyTorch, Python, and R (or ArcGIS Pro) to prepare data, produce and assess semantic segmentation models, and infer to new data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Convolutional neural network (CNN)-based deep learning (DL) methods have transformed the analysis of geospatial, Earth observation, and geophysical data due to their ability to model spatial context information at multiple scales. Such methods are especially applicable to pixel-level classification or semantic segmentation tasks. A variety of R packages have been developed for processing and analyzing geospatial data. However, there are currently no packages available for implementing geospatial DL in the R language and data science environment. This paper introduces the geodl R package, which supports pixel-level classification applied to a wide range of geospatial or Earth science data that can be represented as multidimensional arrays where each channel or band holds a predictor variable. geodl is built on the torch package, which supports the implementation of DL using the R and C++ languages without the need for installing a Python/PyTorch environment. This greatly simplifies the software environment needed to implement DL in R. Using geodl, geospatial raster-based data with varying numbers of bands, spatial resolutions, and coordinate reference systems are read and processed using the terra package, which makes use of C++ and allows for processing raster grids that are too large to fit into memory. Training loops are implemented with the luz package. The geodl package provides utility functions for creating raster masks or labels from vector-based geospatial data and image chips and associated masks from larger files and extents. It also defines a torch dataset subclass for geospatial data for use with torch dataloaders. UNet-based models are provided with a variety of optional ancillary modules or modifications. Common assessment metrics (i.e., overall accuracy, class-level recalls or producer’s accuracies, class-level precisions or user’s accuracies, and class-level F1-scores) are implemented along with a modified version of the unified focal loss framework, which allows for defining a variety of loss metrics using one consistent implementation and set of hyperparameters. Users can assess models using standard geospatial and remote sensing metrics and methods and use trained models to predict to large spatial extents. This paper introduces the geodl workflow, design philosophy, and goals for future development.
Facebook
TwitterThis dataset was created by Aritra Sen
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Fashion Gender Classification Dataset with labeled male and female images for binary classification tasks. Ideal for training, validating, and testing machine learning models using frameworks like TensorFlow, Keras, and PyTorch
Facebook
TwitterThis dataset contains the MountainScape Segmentation Dataset (MS2D), a collection of oblique mountain images from the Mountain Legacy Project and corresponding manually annotated land cover masks. The dataset is split into 144 historic grayscale images collected by early phototopographic surveyors and 140 modern repeat images captured by the Mountain Legacy Project. The image resolutions range from 16 to 80 megapixels and the corresponding masks are RGB images with 8 landcover classes. The image dataset was used to train and test the Python Landscape Classifier (PyLC), a trainable segmentation network and land cover classification tool for oblique landscape photography. The dataset also contains PyTorch models trained with PyLC using the collection of images and masks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overall accuracy = 0.908, macro-averaged producer’s accuracy = 0.885, macro-averaged user’s accuracy = 0.770, and macro-averaged F1-score = 0.823.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model assessment metrics based on ten model replicates with different random seeds and training subsets.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Processed Jigsaw Toxic Comments Dataset
This is a preprocessed and tokenized version of the original Jigsaw Toxic Comment Classification Challenge dataset, prepared for multi-label toxicity classification using transformer-based models like BERT. ⚠️ Important Note: I am not the original creator of the dataset. This dataset is a cleaned and restructured version made for quick use in PyTorch deep learning models.
📦 Dataset Features
Each example contains:
text: The… See the full description on the dataset page: https://huggingface.co/datasets/Koushim/processed-jigsaw-toxic-comments.
Facebook
TwitterThis dataset was created by Ibrahim Salameh
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Large Movie Review Dataset v1.0 . 😃
https://static.amazon.jobs/teams/53/images/IMDb_Header_Page.jpg?1501027252" alt="IMDB wall">
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.
In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings. Further, the train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorising movie-unique terms and their associated with observed labels. In the labelled train/test sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10. Thus reviews with more neutral ratings are not included in the train/test sets. In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and <= 5.
Reference: http://ai.stanford.edu/~amaas/data/sentiment/
NOTE
A starter kernel is here : https://www.kaggle.com/atulanandjha/bert-testing-on-imdb-dataset-starter-kernel
A kernel to expose Dataset collection :
Now let’s understand the task in hand: given a movie review, predict whether it’s positive or negative.
The dataset we use is 50,000 IMDB reviews (25K for train and 25K for test) from the PyTorch-NLP library.
Each review is tagged pos or neg .
There are 50% positive reviews and 50% negative reviews both in train and test sets.
text : Reviews from people.
Sentiment : Negative or Positive tag on the review/feedback (Boolean).
When using this Dataset Please Cite this ACL paper using :
@InProceedings{
maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
}
Link to ref Dataset: https://pytorchnlp.readthedocs.io/en/latest/_modules/torchnlp/datasets/imdb.html
https://www.samyzaf.com/ML/imdb/imdb.html
BERT and other Transformer Architecture models have always been on hype recently due to a great breakthrough by introducing Transfer Learning in NLP. So, Let's use this simple yet efficient Data-set to Test these models, and also compare our results with theirs. Also, I invite fellow researchers to try out their State of the Art Algorithms on this data-set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Confusion matrix and derived metrics for topoDL [52] classification.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Name: Multilingual Names for RNN Classification
Description:
This dataset, sourced from PyTorch's official tutorial, comprises popular names across 18 distinct languages, namely Arabic, Chinese, Czech, Dutch, English, French, German, Greek, Irish, Italian, Japanese, Korean, Polish, Portuguese, Russian, Scottish, Spanish, and Vietnamese. Each language's names are contained in separate text files for easy extraction and categorization.
Usage:
The dataset is particularly useful for tasks like Recurrent Neural Network (RNN) classification, where the aim might be to predict the language origin of a given name based on its character sequence.