99 datasets found
  1. f

    Data_Sheet_1_Is it enough to optimize CNN architectures on ImageNet?.PDF

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data_Sheet_1_Is it enough to optimize CNN architectures on ImageNet?.PDF [Dataset]. https://frontiersin.figshare.com/articles/dataset/Data_Sheet_1_Is_it_enough_to_optimize_CNN_architectures_on_ImageNet_PDF/21555477
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Lukas Tuggener; Jürgen Schmidhuber; Thilo Stadelmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification performance based on ImageNet is the de-facto standard metric for CNN development. In this work we challenge the notion that CNN architecture design solely based on ImageNet leads to generally effective convolutional neural network (CNN) architectures that perform well on a diverse set of datasets and application domains. To this end, we investigate and ultimately improve ImageNet as a basis for deriving such architectures. We conduct an extensive empirical study for which we train 500 CNN architectures, sampled from the broad AnyNetX design space, on ImageNet as well as 8 additional well-known image classification benchmark datasets from a diverse array of application domains. We observe that the performances of the architectures are highly dataset dependent. Some datasets even exhibit a negative error correlation with ImageNet across all architectures. We show how to significantly increase these correlations by utilizing ImageNet subsets restricted to fewer classes. These contributions can have a profound impact on the way we design future CNN architectures and help alleviate the tilt we see currently in our community with respect to over-reliance on one dataset.

  2. Self Driving Car

    • kaggle.com
    zip
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2023). Self Driving Car [Dataset]. https://www.kaggle.com/aslanahmedov/self-driving-carbehavioural-cloning
    Explore at:
    zip(18420532 bytes)Available download formats
    Dataset updated
    Mar 8, 2023
    Authors
    Aslan Ahmedov
    Description

    https://user-images.githubusercontent.com/91852182/147305077-8b86ec92-ed26-43ca-860c-5812fea9b1d8.gif" alt="ezgif com-gif-maker">

    SELF-DRIVING CAR USING UDACITY’S CAR SIMULATOR ENVIRONMENT AND TRAINED BY DEEP NEURAL NETWORKS COMPLETE GUIDE

    Table of Contents

    Introduction

    • Problem Definition
    • Solution Approach
    • Technologies Used
    • Convolutional Neural Networks (CNN)
    • Time-Distributed Layers

    Udacity Simulator and Dataset

    The Training Process

    Augmentation and image pre-processing

    Experimental configurations

    Network architectures

    Results

    • Value loss or Accuracy
    • Why We Use ELU Over RELU

    The Connection Part

    Files

    Overview

    References

    Introduction

    Self-drivi cars has become a trending subject with a significant improvement in the technologies in the last decade. The project purpose is to train a neural network to drive an autonomous car agent on the tracks of Udacity’s Car Simulator environment. Udacity has released the simulator as an open source software and enthusiasts have hosted a competition (challenge) to teach a car how to drive using only camera images and deep learning. Driving a car in an autonomous manner requires learning to control steering angle, throttle and brakes. Behavioral cloning technique is used to mimic human driving behavior in the training mode on the track. That means a dataset is generated in the simulator by user driven car in training mode, and the deep neural network model then drives the car in autonomous mode. Ultimately, the car was able to run on Track 1 generalizing well. The project aims at reaching the same accuracy on real time data in the future.https://user-images.githubusercontent.com/91852182/147298831-225740f9-6903-4570-8336-0c9f16676456.png" alt="6">

    Problem Definition

    Udacity released an open source simulator for self-driving cars to depict a real-time environment. The challenge is to mimic the driving behavior of a human on the simulator with the help of a model trained by deep neural networks. The concept is called Behavioral Cloning, to mimic how a human drives. The simulator contains two tracks and two modes, namely, training mode and autonomous mode. The dataset is generated from the simulator by the user, driving the car in training mode. This dataset is also known as the “good” driving data. This is followed by testing on the track, seeing how the deep learning model performs after being trained by that user data.

    Solution Approach

    https://user-images.githubusercontent.com/91852182/147298261-4d57a5c1-1fda-4654-9741-2f284e6d0479.png" alt="1">

    The problem is solved in the following steps:

    • The simulator can be used to collect data by driving the car in the training mode using a joystick or keyboard, providing the so called “good-driving” behavior input data in form of a driving_log (.csv file) and a set of images. The simulator acts as a server and pipes these images and data log to the Python client.
    • The client (Python program) is the machine learning model built using Deep Neural Networks. These models are developed on Keras (a high-level API over Tensorflow). Keras provides sequential models to build a linear stack of network layers. Such models are used in the project to train over the datasets as the second step. Detailed description of CNN models experimented and used can be referred to in the chapter on network architectures.
    • Once the model is trained, it provides steering angles and throttle to drive in an autonomous mode to the server (simulator).
    • These modules, or inputs, are piped back to the server and are used to drive the car autonomously in the simulator and keep it from falling off the track.

    Technologies Used

    Technologies that are used in the implementation of this project and the motivation behind using these are described in this section.

    TensorFlow: This an open-source library for dataflow programming. It is widely used for machine learning applications. It is also used as both a math library and for large computation. For this project Keras, a high-level API that uses TensorFlow as the backend is used. Keras facilitate in building the models easily as it more user friendly.

    Different libraries are available in Python that helps in machine learning projects. Several of those libraries have improved the performance of this project. Few of them are mentioned in this section. First, “Numpy” that provides with high-level math function collection to support multi-dimensional metrices and arrays. This is used for faster computations over the weights (gradients) in neural networks. Second, “scikit-learn” is a machine learning library for Python which features different algorithms and Machine Learning function packages. Another one is OpenCV (Open Source Computer Vision Library) which is designed for computational efficiency with focus on real-time applications. In this project, OpenCV is used for image preprocessing and augmentation techniques.

    The project makes use of Conda Environment which is an open source distribution for Python which simplifies package management and deployment. It is best for large scale data processing. The machine on which this project was built, is a personal computer.

    Convolutional Neural Networks (CNN)

    CNN is a type of feed-forward neural network computing system that can be used to learn from input data. Learning is accomplished by determining a set of weights or filter values that allow the network to model the behavior according to the training data. The desired output and the output generated by CNN initialized with random weights will be different. This difference (generated error) is backpropagated through the layers of CNN to adjust the weights of the neurons, which in turn reduces the error and allows us produce output closer to the desired one.

    CNN is good at capturing hierarchical and spatial data from images. It utilizes filters that look at regions of an input image with a defined window size and map it to some output. It then slides the window by some defined stride to other regions, covering the whole image. Each convolution filter layer thus captures the properties of this input image hierarchically in a series of subsequent layers, capturing the details like lines in image, then shapes, then whole objects in later layers. CNN can be a good fit to feed the images of a dataset and classify them into their respective classes.

    Time-Distributed Layers

    Another type of layers sometimes used in deep learning networks is a Time- distributed layer. Time-Distributed layers are provided in Keras as wrapper layers. Every temporal slice of an input is applied with this wrapper layer. The requirement for input is that to be at least three-dimensional, first index can be considered as temporal dimension. These Time-Distributed can be applied to a dense layer to each of the timesteps, independently or even used with Convolutional Layers. The way they can be written is also simple in Keras as shown in Figure 1 and Figure 2.

    https://user-images.githubusercontent.com/91852182/147298483-4f37a092-7e71-4ce6-9274-9a133d138a4c.png" alt="2">

    Fig. 1: TimeDistributed Dense layer

    https://user-images.githubusercontent.com/91852182/147298501-6459d968-a279-4140-9be3-2d3ea826d9f6.png" alt="3">

    Fig. 2: TimeDistributed Convolution layer

    Udacity Simulator and Dataset

    We will first download the simulator to start our behavioural training process. Udacity has built a simulator for self-driving cars and made it open source for the enthusiasts, so they can work on something close to a real-time environment. It is built on Unity, the video game development platform. The simulator consists of a configurable resolution and controls setting and is very user friendly. The graphics and input configurations can be changed according to user preference and machine configuration as shown in Figure 3. The user pushes the “Play!” button to enter the simulator user interface. You can enter the Controls tab to explore the keyboard controls, quite similar to a racing game which can be seen in Figure 4.

    https://user-images.githubusercontent.com/91852182/147298708-de15ebc5-2482-42f8-b2a2-8d3c59fceff4.png" alt=" 4">

    Fig. 3: Configuration screen

    https://user-images.githubusercontent.com/91852182/147298712-944e2c2d-e01d-459b-8a7d-3c5471bea179.png" alt="5">

    Fig. 4: Controls Configuration

    The first actual screen of the simulator can be seen in Figure 5 and its components are discussed below. The simulator involves two tracks. One of them can be considered as simple and another one as complex that can be evident in the screenshots attached in Figure 6 and Figure 7. The word “simple” here just means that it has fewer curvy tracks and is easier to drive on, refer Figure 6. The “complex” track has steep elevations, sharp turns, shadowed environment, and is tough to drive on, even by a user doing it manually. Please refer Figure 6. There are two modes for driving the car in the simulator: (1) Training mode and (2) Autonomous mode. The training mode gives you the option of recording your run and capturing the training dataset. The small red sign at the top right of the screen in the Figure 6 and 7 depicts the car is being driven in training mode. The autonomous mode can be used to test the models to see if it can drive on the track without human intervention. Also, if you try to press the controls to get the car back on track, it will immediately notify you that it shifted to manual controls. The mode screenshot can be as seen in Figure 8. Once we have mastered how the car driven controls in simulator using keyboard keys, then we get started with record button to collect data. We will save the data from it in a specified folder as you can see

  3. Image classification in Galaxy with fruit 360 dataset

    • zenodo.org
    tsv
    Updated Aug 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaivan Kamali; Kaivan Kamali (2022). Image classification in Galaxy with fruit 360 dataset [Dataset]. http://doi.org/10.5281/zenodo.5702887
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Aug 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kaivan Kamali; Kaivan Kamali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Credit: 'Fruit recognition from images using deep learning' by H. Muresan and M. Oltean (https://arxiv.org/abs/1712.00580)

    Fruit 360 is a dataset with 90380 images of 131 fruits and vegetables (https://www.kaggle.com/moltean/fruits). Images are 100 pixel by 100 pixel and are RGB (color) images (3 values for each pixel). This dataset is a subset of Fruit 360 dataset, containing only 10 fruits/vegetables (Strawberry, Apple_Red_Delicious, Pepper_Green, Corn, Banana, Tomato_1, Potato_White, Pineapple, Orange, and Peach). We selected a subset of fruits/vegetables, so the dataset size is smaller and the neural network can be trained faster.

    The utilities used to create the dataset, along with step by step instructions, can be found here: https://github.com/kxk302/fruit_dataset_utilities

    First, we created feature vectors for each image. Each image is 100 pixel by pixel and are RGB (color) images (3 values for each pixel). Hence, each image can be represented by 30,000 values (100 X 100 X 3). Second, we selected a subset of 10 fruits/vegetables images (training and test dataset sizes go from 7 GB and 2.5 GB for 131 fruits/vegetables to 500 MB and 177 MB for 10 fruits/vegetables, respectively). Third, we created separate files for feature vectors and labels. Finally, we mapped the labels for the 10 selected fruits/vegetables to a range of 0 to 9.

  4. Dataset for Crack Detection in Images of Bricks and Masonry Using CNNs

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jul 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur Krukowski; Artur Krukowski (2022). Dataset for Crack Detection in Images of Bricks and Masonry Using CNNs [Dataset]. http://doi.org/10.5281/zenodo.6870108
    Explore at:
    Dataset updated
    Jul 21, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Artur Krukowski; Artur Krukowski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for training CNN built from aerial drone images of buildings in Hamburg

    This dataset contains images extracted from aerial surveillance photos of the Speicherstadt and Kesselhaus buildings in Hamburg, provided by the City of Hamburg. Original 834 high resolution images (5472 x 3648 pixels) have been separated into smaller images (227 x 227 pixels) of the size that could be processed using SqueezeNet, a deep Convolutional Neural Network (CNN). This resulted in more than 350 thousand images that had to be subsequently processed automatically to retain images containing solely bricks and mortar and concrete. The final stage contained tedious manual/visual verification of images and their separation into positive (containing cracks) and negative (clear bricks and mortars) sets of images. The final set contains nearly 40 thousand images.

    Since images extracted from Hamburg buildings contained only specific type of bricks and our intention was to extend the CNN to be able to deal with wider range of brick types as well as concrete surfaces, we added to our training set also images from the following Open Access databases (note that such images required resizing to 227 x 227 pixel size before use):

    Such a combined data set resulted in over 80 thousand of images.

    Matlab WebApp Server application based on trained SqueezeNet CNN

    The integrated database of images has been used to train the SqueezeNet CNN using a method proposed by Kenta Itakura in his article published on Matlab Central: Classify crack image using deep learning and explain "WHY", which in turn is based on the work of Lei Zhang reported in his IEEE article: Road crack detection using deep convolutional neural network published at 2016 IEEE International Conference on Image Processing (ICIP).

    The "Matlab" subfolder contains the complete software to allow building the application to run under Matlab WebApps Server. The provided version of the "netTransfer.mat" file has been compiled for Matlab revision 2020b, but it should also work when compiled for other revisions from 2019a onwards. BTW, the original location of the files was "D:\Cracks (2-class)\". For instructions how to use the provided Matlab files, refer to Matlab instructions at MATLAB Web App Server and Get Started with MATLAB Web App Server.

    After producing and uploading the application to the Matlab WebApps Server, the application can be found at http://localhost:9988/webapps/home/ if deployed locally. It can be also deployed on a WEB server, subject to installation of the compliant Matlab Runtime package on the custom server, whcih can be found at MATLAB Runtimes (mathworks.com).

    The important function included in the package is "unscramble.m", which corrects the error that exists in all known revisions of Matlab in uploading images selected by open file function in the Matlab App Designer. The effect is that image is "scrambled beyond recognition" after uploading to the Matlab WebApps Server. Our function de-scrambles such images, converting them into their original form.

  5. f

    The diagnostic result of CWRU.

    • plos.figshare.com
    xls
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qianqian Zhang; Caiyun Hao; Zhongwei Lv; Qiuxia Fan (2023). The diagnostic result of CWRU. [Dataset]. http://doi.org/10.1371/journal.pone.0292381.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Qianqian Zhang; Caiyun Hao; Zhongwei Lv; Qiuxia Fan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Learning powerful discriminative features is the key for machine fault diagnosis. Most existing methods based on convolutional neural network (CNN) have achieved promising results. However, they primarily focus on global features derived from sample signals and fail to explicitly mine relationships between signals. In contrast, graph convolutional network (GCN) is able to efficiently mine data relationships by taking graph data with topological structure as input, making them highly effective for feature representation in non-Euclidean space. In this article, to make good use of the advantages of CNN and GCN, we propose a graph attentional convolutional neural network (GACNN) for effective intelligent fault diagnosis, which includes two subnetworks of fully CNN and GCN to extract the multilevel features information, and uses Efficient Channel Attention (ECA) attention mechanism to reduce information loss. Extensive experiments on three datasets show that our framework improves the representation ability of features and fault diagnosis performance, and achieves competitive accuracy against other approaches. And the results show that GACNN can achieve superior performance even under a strong background noise environment.

  6. f

    Data_Sheet_1_Diagnosis of Patellofemoral Pain Syndrome Based on a...

    • figshare.com
    xlsx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wuxiang Shi; Yurong Li; Baoping Xiong; Min Du (2023). Data_Sheet_1_Diagnosis of Patellofemoral Pain Syndrome Based on a Multi-Input Convolutional Neural Network With Data Augmentation.XLSX [Dataset]. http://doi.org/10.3389/fpubh.2021.643191.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Wuxiang Shi; Yurong Li; Baoping Xiong; Min Du
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Patellofemoral pain syndrome (PFPS) is a common disease of the knee. Despite its high incidence rate, its specific cause remains unclear. The artificial neural network model can be used for computer-aided diagnosis. Traditional diagnostic methods usually only consider a single factor. However, PFPS involves different biomechanical characteristics of the lower limbs. Thus, multiple biomechanical characteristics must be considered in the neural network model. The data distribution between different characteristic dimensions is different. Thus, preprocessing is necessary to make the different characteristic dimensions comparable. However, a general rule to follow in the selection of biomechanical data preprocessing methods is lacking, and different preprocessing methods have their own advantages and disadvantages. Therefore, this paper proposes a multi-input convolutional neural network (MI-CNN) method that uses two input channels to mine the information of lower limb biomechanical data from two mainstream data preprocessing methods (standardization and normalization) to diagnose PFPS. Data were augmented by horizontally flipping the multi-dimensional time-series signal to prevent network overfitting and improve model accuracy. The proposed method was tested on the walking and running datasets of 41 subjects (26 patients with PFPS and 15 pain-free controls). Three joint angles of the lower limbs and surface electromyography signals of seven muscles around the knee joint were used as input. MI-CNN was used to automatically extract features to classify patients with PFPS and pain-free controls. Compared with the traditional single-input convolutional neural network (SI-CNN) model and previous methods, the proposed MI-CNN method achieved a higher detection sensitivity of 97.6%, a specificity of 76.0%, and an accuracy of 89.0% on the running dataset. The accuracy of SI-CNN in the running dataset was about 82.5%. The results prove that combining the appropriate neural network model and biomechanical analysis can establish an accurate, convenient, and real-time auxiliary diagnosis system for PFPS to prevent misdiagnosis.

  7. Rick and Morty Images Dataset

    • kaggle.com
    Updated Jul 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parv Yadav (2021). Rick and Morty Images Dataset [Dataset]. https://www.kaggle.com/parvav/rick-and-morty-images-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Parv Yadav
    Description

    Context

    The hilarious mixture of wittiness, slapstick and action is all set to visit us again, this time more darker and bizarre! So why leave the fun for watching when we can do much more (with enough data, of course!)? This image dataset contains images of popular characters categorized which can be used for classification or image generation.

    Content

    The dataset contains 5 categories of the show's characters- Rick, Morty, Poopybutthole, Summer and Meeseeks.

    Acknowledgements

    I initially thought of using this data but images were way less to generate some kind of result and I felt there was lot noise in accordance with it's size. So, I decided to add on few images and clean the data little bit more. I also tried to balance data as much as possible.

    Inspiration

    I was trying to learn CNN so I thought why not mix it with one of the shows that I love watching! Check out my model here-https://github.com/Parvv/Rick-and-Morty

  8. Gender Classification from an image

    • kaggle.com
    zip
    Updated Jul 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerry (2021). Gender Classification from an image [Dataset]. https://www.kaggle.com/gpiosenka/gender-classification-from-an-image
    Explore at:
    zip(180858795 bytes)Available download formats
    Dataset updated
    Jul 6, 2021
    Authors
    Gerry
    Description

    Context

    I do a lot of work with image data sets. Often it is necessary to partition the images into male and female data sets. Doing this by hand can be a long and tedious task particularly on large data sets. So I decided to create a classifier that could do the task for me.

    Content

    I used the CELEBA aligned data set to provide the images. I went through and separated the images visually into 1747 female and 1747 male training images. I also created 100 male and 100 female test image and 100 male, 100 female validation images. I want to only the face to be in the image so I developed an image cropping function using MTCNN to crop all the images. That function is included as one of the notebooks should anyone have a need for a good face cropping function. I also created an image duplicate detector to try to eliminate any of the training images from appearing in the test or validation images. I have developed a general purpose image classification function that works very well for most image classification tasks. It contains the option to select 1 of 7 models for use. For this application I used the MobileNet model because it is less computationally expensive and gives excellent results. On the test set accuracy is near 100%.

    Acknowledgements

    The CELEBA aligned data set was used. This data set is very large and of good quality. To crop the images to only include the face I developed a face cropping function using MTCNN. MTCNN is a very accurate program and is reasonably fast, however it is notflawless so after cropping the iages you shouldalways visually inspect the results.

    Inspiration

    I developed this data set to train a classifier to be able to distinguish the gender shown in an image. Why bother you may ask I can just look at the image and tell. True but lets say you have a data set of 50,000 images that you want to separate it into male and female data sets. Doing that by hand would take forever. With the trained classifier with near 100% accuracy you can use the classifier with model.predict to do the job for you.

  9. P

    UHRSD Dataset

    • paperswithcode.com
    Updated Apr 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenxi Xie; Changqun Xia; Mingcan Ma; Zhirui Zhao; Xiaowu Chen; Jia Li (2022). UHRSD Dataset [Dataset]. https://paperswithcode.com/dataset/uhrsd
    Explore at:
    Dataset updated
    Apr 10, 2022
    Authors
    Chenxi Xie; Changqun Xia; Mingcan Ma; Zhirui Zhao; Xiaowu Chen; Jia Li
    Description

    Recent salient object detection (SOD) methods based on deep neural network have achieved remarkable performance. However, most of existing SOD models designed for low-resolution input perform poorly on high-resolution images due to the contradiction between the sampling depth and the receptive field size. Aiming at resolving this contradiction, we propose a novel one-stage framework called Pyramid Grafting Network (PGNet), using transformer and CNN backbone to extract features from different resolution images independently and then graft the features from transformer branch to CNN branch. An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically, guided by different source feature during decoding process. Moreover, we design an Attention Guided Loss (AGL) to explicitly supervise the attention matrix generated by CMGM to help the network better interact with the attention from different models. We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions. To our knowledge, it is the largest dataset in both quantity and resolution for high-resolution SOD task, which can be used for training and testing in future research. Sufficient experiments on UHRSD and widely-used SOD datasets demonstrate that our method achieves superior performance compared to the state-of-the-art methods.

  10. h

    cnn_dailymail

    • huggingface.co
    Updated Aug 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abigail See (2023). cnn_dailymail [Dataset]. https://huggingface.co/datasets/abisee/cnn_dailymail
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2023
    Authors
    Abigail See
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for CNN Dailymail Dataset

      Dataset Summary
    

    The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.

      Supported Tasks and Leaderboards
    

    'summarization': Versions… See the full description on the dataset page: https://huggingface.co/datasets/abisee/cnn_dailymail.

  11. Galaxy Zoo DECaLS: Trained Representations

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Walmsley; Mike Walmsley; Anna Scaife; Anna Scaife (2025). Galaxy Zoo DECaLS: Trained Representations [Dataset]. http://doi.org/10.5281/zenodo.5536996
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mike Walmsley; Mike Walmsley; Anna Scaife; Anna Scaife
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These representations predate Zoobot 2.0 - you may find better performance with those more recent models. See the Zoobot github repository and HuggingFace.

    Image representations are lower-dimensional summaries convenient for machine learning searches, predictions, clustering, etc.

    This archive includes representations of galaxy images for subsets of DECaLS DR5 and SDSS. It also includes some further data useful for reproducing a series of practical experiments using those representations (see W+22, bottom of this page).

    Representations

    The representations are calculated with a CNN trained to predict volunteer answers to Galaxy Zoo DECaLS questions with the code "Zoobot", introduced in W+21 (bottom of this page). The weights of this CNN are available via the Zoobot github repository, currently under the checkpoint folder data/pretrained_models/decals_dr_trained_on_all_labelled_m0. See W+21 for details.

    The most significant file is "cnn_features_decals.parquet". This file contains the representations calculated for the approx. 340k GZ DECaLS galaxies. See W+21 for a description of GZD-5. Galaxies can be crossmatched to other catalogues (e.g. the GZ DECaLS catalogue) by iauname.

    "cnn_features_gz2.parquet" is the representations calculated by the *same* model, i.e. without retraining on labelled SDSS GZ2 images, for the approx 240k images classifed in Galaxy Zoo 2 (Willet 2013). These are still fairly good (see W+22), implying the CNN can sometimes generalise well to slightly different surveys. However, they could likely be improved by using a model trained on GZ2 directly. The Zoobot code makes this straightforward. The galaxies can be cross-matched to the Galaxy Zoo 2 catalogues on the "id_str" column, which is equal to the GZ2 objid (e.g. "588018090547020096").

    Confused about .parquet? Think of it as a csv that's very fast to load. Load them like so:

    import pandas as pd
    
    df = pd.read_parquet(parquet_loc)

    You might like to check zoobot.readthedocs.io for guidance on the CNN weights and a pair of ring galaxy catalogues.

    References

    Please cite one or both of these papers if you use this dataset. The labels and trained model come from W+21, while the representations were created in W+22.

    W+21: https://arxiv.org/abs/2102.08414, Galaxy Zoo DECaLS: Detailed Visual Morphology Measurements from Volunteers and Deep Learning for 314,000 Galaxies

    W+22: https://arxiv.org/abs/2110.12735, Practical Morphology Tools from Deep Supervised Representation Learning

  12. d

    Fish Detection AI, sonar image-trained detection, counting, tracking models

    • catalog.data.gov
    • mhkdr.openei.org
    Updated Apr 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Water Power Technology Office (2025). Fish Detection AI, sonar image-trained detection, counting, tracking models [Dataset]. https://catalog.data.gov/dataset/fish-detection-ai-sonar-image-trained-detection-counting-tracking-models
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset provided by
    Water Power Technology Office
    Description

    The Fish Detection AI project aims to improve the efficiency of fish monitoring around marine energy facilities to comply with regulatory requirements. Despite advancements in computer vision, there is limited focus on sonar images, identifying small fish with unlabeled data, and methods for underwater fish monitoring for marine energy. A Faster R-CNN (Region-based Convolutional Neural Network) was developed using sonar images from Alaska Fish and Games to identify, track, and count fish in underwater environments. Supervised methods were used with Faster R-CNN to detect fish based on training using labeled data of fish. Customized filters were specifically applied to detect and count small fish when labeled datasets were unavailable. Unsupervised Domain Adaptation techniques were implemented to enable trained models to be applied to different unseen datasets, reducing the need for labeling datasets and training new models for various locations. Additionally, elastic shape analysis (ESA), hyper-image analysis, and various image preprocessing methods were explored to enhance fish detection. In this research we achieved: 1. Faster R-CNN for Sonar images - Applied Faster R-CNN reached > 0.85 average precision (AP) for large fish detection, providing robust results for higher-quality sonar images. - Integrated Norfair tracking to reduce double-counting of fish across video frames, enabling more accurate population estimates. 2. Small Fish Identification - Established customized filtering methods for small, often unlabeled fish in noisy acoustic images. This submission of data includes several sub-directories: - FryCounting: contains information on how to count small fish (i.e., fry) in the sonar image data - SG_aldi_addons: contains additions to the ALDI code (SG = Steven Gutstein, primary author) such as the trained models used in this experiment, which should match the models achieved when the training instructions are followed, and code for how to make the sonar images into movies - Summaries_Dir: contains information on how to set up the foundation to perform these experiments, such as installing all required packages and versions, and creating the PyTorch and ALDI environments These experiments boil down to a 2-part structure as described in the uploaded readme file: Part I: Installing and Using ALDI & Norfair Code - This is used for tracking and counting fish, and is a replication of the article that is linked, namely the Align and Distill (Aldi) work done by Justin Kay and others - This part relates to the Summaries_Dir subfolder, and the SG_aldi_addons sub-folder Part II: Installing and Using Fry Code - This is used to track and count smaller fish (aka fry) - This relates to the FryCounting sub-directory Also included here are links to the downloadable sonar data and the article that was replicated in this study.

  13. The Street View Text Dataset

    • kaggle.com
    zip
    Updated Jan 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nagesh Singh Chauhan (2021). The Street View Text Dataset [Dataset]. https://www.kaggle.com/nageshsingh/the-street-view-text-dataset
    Explore at:
    zip(118140123 bytes)Available download formats
    Dataset updated
    Jan 25, 2021
    Authors
    Nagesh Singh Chauhan
    Description

    Context

    The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses. More details about the data set can be found in our paper, Word Spotting in the Wild [1]. For our up-to-date benchmarks on this data, see our paper, End-to-end Scene Text Recognition [2].

    Content

    This dataset only has word-level annotations (no character bounding boxes) and should be used for:

    1. cropped lexicon-driven word recognition and
    2. full image lexicon-driven word detection and recognition.

    Acknowledgements

    Downloaded from http://www.iapr-tc11.org/mediawiki/index.php?title=The_Street_View_Text_Dataset

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  14. Z

    Low-dose Computed Tomography Perceptual Image Quality Assessment Grand...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Wang (2023). Low-dose Computed Tomography Perceptual Image Quality Assessment Grand Challenge Dataset (MICCAI 2023) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7833095
    Explore at:
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Fabian Wagner
    Andreas Maier
    Adam Wang
    Jongduk Baek
    Jang-Hwan Choi
    Wonkyeong Lee
    Scott S. Hsieh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Image quality assessment (IQA) is extremely important in computed tomography (CT) imaging, since it facilitates the optimization of radiation dose and the development of novel algorithms in medical imaging, such as restoration. In addition, since an excessive dose of radiation can cause harmful effects in patients, generating high- quality images from low-dose images is a popular topic in the medical domain. However, even though peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) are the most widely used evaluation metrics for these algorithms, their correlation with radiologists’ opinion of the image quality has been proven to be insufficient in previous studies, since they calculate the image score based on numeric pixel values (1-3). In addition, the need for pristine reference images to calculate these metrics makes them ineffective in real clinical environments, considering that pristine, high-quality images are often impossible to obtain due to the risk posed to patients as a result of radiation dosage. To overcome these limitations, several studies have aimed to develop a no-reference novel image quality metric that correlates well with radiologists’ opinion on image quality without any reference images (2, 4, 5).

    Nevertheless, due to the lack of open-source datasets specifically for CT IQA, experiments have been conducted with datasets that differ from each other, rendering their results incomparable and introducing difficulties in determining a standard image quality metric for CT imaging. Besides, unlike real low-dose CT images with quality degradation due to various combinations of artifacts, most studies are conducted with only one type of artifact (e.g., low-dose noise (6-11), view aliasing (12), metal artifacts (13), scattering (14-16), motion artifacts (17-22), etc.). Therefore, this challenge aims to 1) evaluate various NR-IQA models on CT images containing complex noise/artifacts, 2) to compare their correlations with scores produced by radiologists, and 3) to grant insights into the determination of the best-performing metric of CT imaging in terms of correlating with the perception of radiologists’.

    Furthermore, considering that low-dose CT images are achieved by reducing the number of projections per rotation and by reducing the X-ray current, the combination of two major artifacts, namely the sparse view streak and noise generated by these methods, is dealt with in this challenge so that the best-performing IQA model applicable in real clinical environments can be verified.

    Funding Declaration:

    This research was partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.RS-2022-00155966, Artificial Intelligence Convergence Innovation Human Resources Development (Ewha Womans University)), and by the National Research Foundation of Korea (NRF-2022R1A2C1092072), and by the Korea Medical Device Development Fund grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 1711174276, RS-2020-KD000016).

    References:

    Lee W, Cho E, Kim W, Choi J-H. Performance evaluation of image quality metrics for perceptual assessment of low-dose computed tomography images. Medical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment: SPIE, 2022.

    Lee W, Cho E, Kim W, Choi H, Beck KS, Yoon HJ, Baek J, Choi J-H. No-reference perceptual CT image quality assessment based on a self-supervised learning framework. Machine Learning: Science and Technology 2022.

    Choi D, Kim W, Lee J, Han M, Baek J, Choi J-H. Integration of 2D iteration and a 3D CNN-based model for multi-type artifact suppression in C-arm cone-beam CT. Machine Vision and Applications 2021;32(116):1-14.

    Pal D, Patel B, Wang A. SSIQA: Multi-task learning for non-reference CT image quality assessment with self-supervised noise level prediction. 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI): IEEE, 2021; p. 1962-1965.

    Mittal A, Moorthy AK, Bovik AC. No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 2012;21(12):4695-4708. doi: 10.1109/TIP.2012.2214050

    Lee J-YK, Wonjin; Lee, Yebin; Lee, Ji-Yeon; Ko, Eunji; Choi, Jang-Hwan. Unsupervised Domain Adaptation for Low-dose Computed Tomography Denoising. IEEE Access 2022.

    Jeon S-Y, Kim W, Choi J-H. MM-Net: Multi-frame and Multi-mask-based Unsupervised Deep Denoising for Low-dose Computed Tomography. IEEE Transactions on Radiation and Plasma Medical Sciences 2022.

    Kim W, Lee J, Kang M, Kim JS, Choi J-H. Wavelet subband-specific learning for low-dose computed tomography denoising. PloS one 2022;17(9):e0274308.

    Han M, Shim H, Baek J. Low-dose CT denoising via convolutional neural network with an observer loss function. Med Phys 2021;48(10):5727-5742. doi: 10.1002/mp.15161

    Kim B, Shim H, Baek J. Weakly-supervised progressive denoising with unpaired CT images. Med Image Anal 2021;71:102065. doi: 10.1016/j.media.2021.102065

    Wagner F, Thies M, Gu M, Huang Y, Pechmann S, Patwari M, Ploner S, Aust O, Uderhardt S, Schett G, Christiansen S, Maier A. Ultralow-parameter denoising: Trainable bilateral filter layers in computed tomography. Med Phys 2022;49(8):5107-5120. doi: 10.1002/mp.15718

    Kim B, Shim H, Baek J. A streak artifact reduction algorithm in sparse-view CT using a self-supervised neural representation. Med Phys 2022. doi: 10.1002/mp.15885

    Kim S, Ahn J, Kim B, Kim C, Baek J. Convolutional neural network-based metal and streak artifacts reduction in dental CT images with sparse-view sampling scheme. Med Phys 2022;49(9):6253-6277. doi: 10.1002/mp.15884

    Bier B, Berger M, Maier A, Kachelrieß M, Ritschl L, Müller K, Choi JH, Fahrig R. Scatter correction using a primary modulator on a clinical angiography Carm CT system. Med Phys 2017;44(9):e125-e137.

    Maul N, Roser P, Birkhold A, Kowarschik M, Zhong X, Strobel N, Maier A. Learning-based occupational x-ray scatter estimation. Phys Med Biol 2022;67(7). doi: 10.1088/1361-6560/ac58dc

    Roser P, Birkhold A, Preuhs A, Syben C, Felsner L, Hoppe E, Strobel N, Kowarschik M, Fahrig R, Maier A. X-Ray Scatter Estimation Using Deep Splines. IEEE Trans Med Imaging 2021;40(9):2272-2283. doi: 10.1109/TMI.2021.3074712

    Maier J, Nitschke M, Choi JH, Gold G, Fahrig R, Eskofier BM, Maier A. Rigid and Non-Rigid Motion Compensation in Weight-Bearing CBCT of the Knee Using Simulated Inertial Measurements. IEEE Trans Biomed Eng 2022;69(5):1608-1619. doi: 10.1109/TBME.2021.3123673

    Choi JH, Maier A, Keil A, Pal S, McWalter EJ, Beaupré GS, Gold GE, Fahrig R. Fiducial markerbased correction for involuntary motion in weightbearing Carm CT scanning of knees. II. Experiment. Med Phys 2014;41(6Part1):061902.

    Choi JH, Fahrig R, Keil A, Besier TF, Pal S, McWalter EJ, Beaupré GS, Maier A. Fiducial markerbased correction for involuntary motion in weightbearing Carm CT scanning of knees. Part I. Numerical modelbased optimization. Med Phys 2013;40(9):091905.

    Berger M, Muller K, Aichert A, Unberath M, Thies J, Choi JH, Fahrig R, Maier A. Marker-free motion correction in weight-bearing cone-beam CT of the knee joint. Med Phys 2016;43(3):1235-1248. doi: 10.1118/1.4941012

    Ko Y, Moon S, Baek J, Shim H. Rigid and non-rigid motion artifact reduction in X-ray CT using attention module. Med Image Anal 2021;67:101883. doi: 10.1016/j.media.2020.101883

    Preuhs A, Manhart M, Roser P, Hoppe E, Huang Y, Psychogios M, Kowarschik M, Maier A. Appearance Learning for Image-Based Motion Estimation in Tomography. IEEE Trans Med Imaging 2020;39(11):3667-3678. doi: 10.1109/TMI.2020.3002695

  15. h

    boulderspot

    • huggingface.co
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Szemraj (2024). boulderspot [Dataset]. https://huggingface.co/datasets/pszemraj/boulderspot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2024
    Authors
    Peter Szemraj
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    pszemraj/boulderspot

    These are aerial images of Switzerland classified into what could be a bouldering area (label: bouldering_area) or not (label: other). The test set has no labels (i.e. the column is None) and is randomly sampled from across the country. Sources:

    data: SWISSIMAGE 10 cm labels: me

    Date created: 2021 You can find some example CNN-based models trained on an earlier/smaller version of this dataset in this repo If you are a member of an organization interested in… See the full description on the dataset page: https://huggingface.co/datasets/pszemraj/boulderspot.

  16. d

    Periodic and heterogeneous solid and velocity data used to train and...

    • search.dataone.org
    • datadryad.org
    • +1more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danny D. Ko; Hangjie Ji; Y. Sungtaek Ju (2023). Periodic and heterogeneous solid and velocity data used to train and validate CNN models [Dataset]. http://doi.org/10.5068/D16108
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Danny D. Ko; Hangjie Ji; Y. Sungtaek Ju
    Time period covered
    Jan 1, 2023
    Description

    Data-driven deep learning models are emerging as a promising method for characterizing pore-scale flow through complex porous media while requiring minimal computational power. However, previous models often require extensive computation to simulate flow through synthetic porous media for use as training data. We propose a convolutional neural network trained solely on periodic unit cells to predict pore-scale velocity fields of complex heterogeneous porous media from binary images without the need for further image processing. Our model is trained using a range of simple and complex unit cells that can be obtained analytically or numerically at a low computational cost. Our results show that the model accurately predicts the permeability and pore-scale flow characteristics of synthetic porous media and real reticulated foams. We significantly improve the convergence of numerical simulations by using the predictions from our model as initial guesses. Our approach addresses the limitatio..., ,

  17. P

    PapioVoc Dataset

    • paperswithcode.com
    Updated Feb 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillem Bonafos; Pierre Pudlo; Jean-Marc Freyermuth; Thierry Legou; Joël Fagot; Samuel Tronçon; Arnaud Rey (2023). PapioVoc Dataset [Dataset]. https://paperswithcode.com/dataset/papiovoc
    Explore at:
    Dataset updated
    Feb 13, 2023
    Authors
    Guillem Bonafos; Pierre Pudlo; Jean-Marc Freyermuth; Thierry Legou; Joël Fagot; Samuel Tronçon; Arnaud Rey
    Description

    Abstract

    The data collection process consisted of continuously recording during one month a group of Guinea baboons living in semi-liberty at the CNRS primatology center in Rousset-sur-Arc (France). Two microphones we placed nearby their enclosure to continuously record the sounds produced by the group. A convolutional neural network (CNN) was used on these large and noisy audio recordings to automatically extract segments of sound containing a baboon vocal production by following the method of Bonafos et al. (2023). The resulting dataset consists of one-second to several-minute wav files of automatically detected vocalizations segments. The dataset thus provides a wide range of baboon vocalizations produced at all times of the day. It can be used to study vocal productions of non-human primates, their repertoire, their distribution over the day, their frequency, and their heterogeneity. In addition to the analysis of animal communication, the dataset can also be used as a learning base for sound classification models.

    Data acquisition

    The data are audio recordings of baboons. The recordings were made with a H6 Zoom recorder, using the included XYH-6 stereo microphone. The sample size is 44100 Hertz, 16 bits. The microphones were placed in the vicinity of the enclosure for one month and recorded continuously on a PC computer. A CNN passed over the data with a sliding window of 1 second and an overlap of 80% to detect the vocal productions of the baboons. The dataset consists of the segments predicted by the CNN to contain a baboon vocalization. Windows containing signal less than one second apart were merged into a single vocalization.

    Data source location

    Institution: CNRS, Primate Facility

    City/Town/Region: Rousset-sur-Arc

    Country: France

    Latitude and longitude for collected samples/data: 43.47033535251509, 5.6514732876668905

    Value of the data

    This dataset is relatively unique in terms of the quantity of vocalizations available.

    This massive dataset can be very useful to two types of scientific communities: experts in primatology who study the vocal productions of non-human primates, and experts in data science and audio signal processing.

    The machine learning research community has at its disposal a database of several dozen hours of animal vocalizations, which will make it possible to build up a large learning base, very useful for Environemental Sound Recognition tasks, for example.

    Objective

    This dataset is a follow-up of two studies on the vocal productions of Guinea baboons (Papio papio) in which we carried out analyses of their vocal productions on the basis of a relatively large vocalization sample containing around 1300 vocalizations (Boë, Berthommier, Legou, Captier, Kemp, Sawallis, Becker, Rey, & Fagot, 2017; Kemp, Rey, Legou, Boë, Berthommier, Becker, & Fagot, 2017). The aim was to collect a larger database using the technique of deep convolutional neural networks in order to 1) automatically detect vocal productions in a large continuous audio recording and 2) perform a categorization of these vocalizations on a more massive sample. A description of the pipeline that enabled these automatic detections and categorizations is given in Bonafos, Pudlo, Freyermuth, Legou, Fagot, Tronçon, & Rey (2023).

    Data description

    The data is a set of audio files in wav format. They are at least one second long (the size of the window), up to several minutes, if several windows are consecutively predicted as containing signal. Moreover, we add the labeled data we used to train the CNN which did the prediction. We also provide two hours of the continuous recordings to have an idea of the continuous recordings and test the code of the paper provided on gitlab.

    In addition, there is a database in csv format listing all the vocalizations, the day and time of their production, and the prediction probabilities of the model.

    Experimental design, materials and methods

    The original recordings represent one month of continuous audio recording. Seven hours of this month were manually labelled. They were segmented and labelled according to whether or not there was a monkey vocalization (i.e., noise or vocalization) and, if there was a vocalization, according to the type of vocalization (6 possible classes: bark, copulation grunt, grunt, scream, yak, wahoo). These manually labelled data were used as a training set for a CNN, which was automatically trained following the pipeline of Bonafos et al. (2023). This model was then used to automatically detect and classify vocalization during the whole month of audio recording. It processes the data in the same way when predicting new data as it does when training. It uses a sliding window of one second with an overlap of 80%. It does not take into account information from previous predictions, but calculates the probability of a vocalization in each one-second window independently. It then iterates through the month. For each window, the model predicts two outputs: the probability that there is a vocalization and the probability of each class of vocalization.

    For the purpose of generating the wav files, if a window has a probability of a vocalization greater than 0.5, it is considered to contain a vocalization. If it is the first one, a vocalization is started at that moment. If the time windows that follow a vocalization also contain a vocalization, then the signal they contain is added to the first segment for which a vocalization has been detected. As soon as a one-second segment no longer contains a signal corresponding to a vocalization, the wav file is closed. If windows are predicted to contain no vocalizations, but are between two windows that contain vocalizations within 1 second of each other, then all windows are merged.

  18. m

    A Dataset of Lung Ultrasound Images for Automated AI-based Lung Disease...

    • data.mendeley.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Katumba (2025). A Dataset of Lung Ultrasound Images for Automated AI-based Lung Disease Classification [Dataset]. http://doi.org/10.17632/hb3p34ytvx.2
    Explore at:
    Dataset updated
    Jul 10, 2025
    Authors
    Andrew Katumba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a curated benchmark collection of 1,062 labelled lung ultrasound (LUS) images collected from patients at Mulago National Referral Hospital and Kiruddu Referral Hospital in Kampala, Uganda. The images were acquired and annotated by senior radiologists to support the development and evaluation of artificial intelligence (AI) models for pulmonary disease diagnosis. Each image is categorized into one of three classes: Probably COVID-19 (COVID-19), Diseased Lung but Probably Not COVID-19 (Other Lung Disease), and Healthy Lung.

    The dataset addresses key challenges in LUS interpretation, including inter-operator variability, low signal-to-noise ratios, and reliance on expert sonographers. It is particularly suitable for training and testing convolutional neural network (CNN)-based models for medical image classification tasks in low-resource settings. The images are provided in standard formats such as PNG or JPEG, with corresponding labels stored in structured files like CSV or JSON to facilitate ease of use in machine learning workflows.

    In this second version of the dataset, we have extended the resource by including a folder containing the original unprocessed raw data, as well as the scripts used to process, clean, and sort the data into the final labelled set. These additions promote transparency and reproducibility, allowing researchers to understand the full data pipeline and adapt it for their own applications. This resource is intended to advance research in deep learning for lung ultrasound analysis and to contribute toward building more accessible and reliable diagnostic tools in global health.

  19. Global Classification Dataset of Daytime and Nighttime Marine Low-cloud...

    • zenodo.org
    bin
    Updated Feb 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanyuan Wu; Jihu Liu; Yannian Zhu; Yu Zhang; Yang Cao; Kang-En Huang; Boyang Zheng; Yichuan Wang; Quan Wang; Chen Zhou; Yuan Liang; Minghuai Wang; Daniel Rosenfeld; Yuanyuan Wu; Jihu Liu; Yannian Zhu; Yu Zhang; Yang Cao; Kang-En Huang; Boyang Zheng; Yichuan Wang; Quan Wang; Chen Zhou; Yuan Liang; Minghuai Wang; Daniel Rosenfeld (2025). Global Classification Dataset of Daytime and Nighttime Marine Low-cloud Mesoscale Morphology [Dataset]. http://doi.org/10.5281/zenodo.14860350
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yuanyuan Wu; Jihu Liu; Yannian Zhu; Yu Zhang; Yang Cao; Kang-En Huang; Boyang Zheng; Yichuan Wang; Quan Wang; Chen Zhou; Yuan Liang; Minghuai Wang; Daniel Rosenfeld; Yuanyuan Wu; Jihu Liu; Yannian Zhu; Yu Zhang; Yang Cao; Kang-En Huang; Boyang Zheng; Yichuan Wang; Quan Wang; Chen Zhou; Yuan Liang; Minghuai Wang; Daniel Rosenfeld
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The global classification dataset of daytime and nighttime marine low-cloud mesoscale morphology with six cloud types (Solid stratus, Closed MCC, Open MCC, Disorganized MCC, Clustered Cu and Suppressed Cu). The spatial resolution is 1o × 1o and the temporal resolution is 5 minutes for the years 2018-2022. They were established based on a deep learning model ResNet-50. Trained on daytime radiance data from MODIS (Moderate Resolution Imaging Spectroradiometer) and daytime retrieved COT (Cloud Optical Thickness), this model achieved a high prediction accuracy and can be applied to nighttime cloud classification. For a detailed introduction to the model, please refer to our article.

    Technical info

    Product information

    • File ‘day_xxxx_all.h5’: Daytime classification of global marine low-cloud morphology for the year xxxx, with a spatial resolution of 1°×1° and a temporal resolution of 5 minutes
      • date: time of the 1°×1° box, format: 'YYYYDDD.HHHH'
      • lon: central longitude (-180, 180)
      • lat: central latitude (-60, 60)
      • cat: category of the cloud morphology. The numbers 0-5 represent each of the six categories: 0-Solid stratus, 1-Closed MCC, 2-Open MCC, 3-Disorganized MCC, 4-Clustered Cu, 5-Suppressed Cu
      • cert: model certainty, the probability that this cloud morphology belongs to the assigned category
      • low_cf: the cloud fraction of low clouds
      • COT_CNN: average cloud optical thickness (COT), retrieved using TIR-CNN model from Wang et al. (2022)
      • CER_CNN: average cloud effective radius (CER), retrieved using TIR-CNN model from Wang et al. (2022), in the unit of μm
      • LWP_CNN: average cloud liquid path (LWP), calculated from COT_CNN and CER_CNN, in the unit of g/㎡
      • Sensor_zenith: scene average sensor zenith angle, from MODIS MYD021, in the unit of degree (°)
    • File 'night_xxxx_all.h5': Nighttime classification of global marine low-cloud morphology for the year xxxx, with a spatial resolution of 1°×1° and a temporal resolution of 5 minutes
      • same variables as daytime
    • File 'example.xlsx': A sample of the variable data from our cloud classification dataset, showcasing the classification results of a MODIS granule captured on January 1, 2018, at 00:25 UTC. This sample is provided to help users better understand the content of our dataset.

    Training, Validation, Test dataset

    • Originating from the same classification dataset, same variables, only differ in sample size
    • Files 'training_dataset.h5', 'validation_dataset.h5', and 'test_dataset.h5':
      • date: time of the 128×128 pixels, format: 'YYYYDDD.HHHH'
      • lat: central latitude of the 128×128 scene
      • lon: central longitude of the 128×128 scene
      • cat: category of the cloud morphology. The numbers 0-5 represent each of the six categories: 0-Solid stratus, 1-Closed MCC, 2-Open MCC, 3-Disorganized MCC, 4-Clustered Cu, 5-Suppressed Cu
      • CTH: cloud top height, in-cloud average value, in the unit of km
      • COT_retrieved: cloud optical thickness (COT), retrieved using TIR-CNN model from Wang et al. (2022), 128×128 pixels
      • LWP: cloud liquid path (LWP) from MODIS MYD06, in-cloud average value, in the unit of g/㎡
      • Sensor_zenith: scene average sensor zenith angle, from MODIS MYD021, in the unit of degree (°)
      • emis_29: radiance data from thermal infrared channel 29 (8.7μm), 128×128 pixels
      • emis_31: radiance data from thermal infrared channel 31 (10.8 μm), 128×128 pixels
      • emis_32: radiance data from thermal infrared channel 32 (12.0 μm), 128×128 pixels
      • i: the row number of the top-left pixel of 128 ×128 scene in the MODIS granule
      • j: the column number of the top-left pixel of 128 ×128 scene in the MODIS granule
  20. r

    Data from: An Integrated Approach of Belief Rule Base and Convolutional...

    • researchdata.se
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sami Kabir; Raihan Ul Islam; Karl Andersson (2024). An Integrated Approach of Belief Rule Base and Convolutional Neural Network to Monitor Air Quality in Shanghai [Dataset]. http://doi.org/10.24433/CO.8230207.v1
    Explore at:
    (21516)Available download formats
    Dataset updated
    Jun 19, 2024
    Dataset provided by
    Luleå University of Technology
    Authors
    Sami Kabir; Raihan Ul Islam; Karl Andersson
    Area covered
    Shanghai
    Description

    Accurate monitoring of air quality can reduce its adverse impact on earth. Ground-level sensors can provide fine particulate matter (PM2.5) concentrations and ground images. But, such sensors have limited spatial coverage and require deployment cost. PM2.5 can be estimated from satellite-retrieved Aerosol Optical Depth (AOD) too. However, AOD is subject to uncertainties associated with its retrieval algorithms and constrain the spatial resolution of estimated PM2.5. AOD is not retrievable under cloudy weather as well. In contrast, satellite images provide continuous spatial coverage with no separate deployment cost. Accuracy of monitoring from such satellite images is hindered due to uncertainties of sensor data of relevant environmental parameters, such as, relative humidity, temperature, wind speed and wind direction . Belief Rule Based Expert System (BRBES) is an efficient algorithm to address these uncertainties. Convolutional Neural Network (CNN) is suitable for image analytics. Hence, we propose a novel model by integrating CNN with BRBES to monitor air quality from satellite images with improved accuracy. We customized CNN and optimized BRBES to increase monitoring accuracy further. An obscure image has been differentiated between polluted air and cloud based on relationship of PM2.5 with relative humidity. Valid environmental data (temperature, wind speed and wind direction) have been adopted to further strengthen the monitoring performance of our proposed model. Three-year observation data (satellite images and environmental parameters) from 2014 to 2016 of Shanghai have been employed to analyze and design our proposed model.

    Source code and dataset

    We implement our proposed integrated algorithm with Python 3 and C++ programming language. We process the satellite images with OpenCV library. Keras library functions are used to implement our customized VGG Net. We write python script smallervggnet.py to build this VGG Net. Next, we train and test this network with a dataset of satellite images through train.py script. This dataset consists of 3-year satellite images of Oriental Pearl Tower, Shanghai, China from Planet from January-2014 till December-2016 (Planet Team, 2017). These images are captured by PlanetScope, which is a constellation composed by approximately 120 optical satellites operated by Planet (Planet Team, San Francisco, CA, USA, 2016). Based on the level of PM2.5, this dataset is divided into three classes: HighPM, MediumPM and LowPM. We classify a new satellite image (201612230949.png) with our trained VGG Net by classify.py script. Standard file I/O is used to feed this classification output to the first BRBES (cnn_brb_1.cpp) through a text file (cnn_prediction.txt). In addition to VGG Net classification output, cloud percentage and relative humidity are fed as input to first BRBES. We write cnn_brb_2.cpp to implement second BRBES, which takes the output of first BRBES, temperature and wind speed as its input. Wind direction based recalculation of the output of second BRBES is also performed in this cpp file to compute the final monitoring value of PM2.5. We demonstrate this code architecture through a flow chart in Figure 5 of the manuscript.Source code and dataset of the satellite images are made freely available through the published compute capsule (https://doi.org/10.24433/CO.8230207.v1).

    Code: MIT license; Data: No Rights Reserved (CC0)

    The dataset was originally published in DiVA and moved to SND in 2024.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data_Sheet_1_Is it enough to optimize CNN architectures on ImageNet?.PDF [Dataset]. https://frontiersin.figshare.com/articles/dataset/Data_Sheet_1_Is_it_enough_to_optimize_CNN_architectures_on_ImageNet_PDF/21555477

Data_Sheet_1_Is it enough to optimize CNN architectures on ImageNet?.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Lukas Tuggener; Jürgen Schmidhuber; Thilo Stadelmann
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Classification performance based on ImageNet is the de-facto standard metric for CNN development. In this work we challenge the notion that CNN architecture design solely based on ImageNet leads to generally effective convolutional neural network (CNN) architectures that perform well on a diverse set of datasets and application domains. To this end, we investigate and ultimately improve ImageNet as a basis for deriving such architectures. We conduct an extensive empirical study for which we train 500 CNN architectures, sampled from the broad AnyNetX design space, on ImageNet as well as 8 additional well-known image classification benchmark datasets from a diverse array of application domains. We observe that the performances of the architectures are highly dataset dependent. Some datasets even exhibit a negative error correlation with ImageNet across all architectures. We show how to significantly increase these correlations by utilizing ImageNet subsets restricted to fewer classes. These contributions can have a profound impact on the way we design future CNN architectures and help alleviate the tilt we see currently in our community with respect to over-reliance on one dataset.

Search
Clear search
Close search
Google apps
Main menu