Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classification performance based on ImageNet is the de-facto standard metric for CNN development. In this work we challenge the notion that CNN architecture design solely based on ImageNet leads to generally effective convolutional neural network (CNN) architectures that perform well on a diverse set of datasets and application domains. To this end, we investigate and ultimately improve ImageNet as a basis for deriving such architectures. We conduct an extensive empirical study for which we train 500 CNN architectures, sampled from the broad AnyNetX design space, on ImageNet as well as 8 additional well-known image classification benchmark datasets from a diverse array of application domains. We observe that the performances of the architectures are highly dataset dependent. Some datasets even exhibit a negative error correlation with ImageNet across all architectures. We show how to significantly increase these correlations by utilizing ImageNet subsets restricted to fewer classes. These contributions can have a profound impact on the way we design future CNN architectures and help alleviate the tilt we see currently in our community with respect to over-reliance on one dataset.
https://user-images.githubusercontent.com/91852182/147305077-8b86ec92-ed26-43ca-860c-5812fea9b1d8.gif" alt="ezgif com-gif-maker">
Self-drivi cars has become a trending subject with a significant improvement in the technologies in the last decade. The project purpose is to train a neural network to drive an autonomous car agent on the tracks of Udacity’s Car Simulator environment. Udacity has released the simulator as an open source software and enthusiasts have hosted a competition (challenge) to teach a car how to drive using only camera images and deep learning. Driving a car in an autonomous manner requires learning to control steering angle, throttle and brakes. Behavioral cloning technique is used to mimic human driving behavior in the training mode on the track. That means a dataset is generated in the simulator by user driven car in training mode, and the deep neural network model then drives the car in autonomous mode. Ultimately, the car was able to run on Track 1 generalizing well. The project aims at reaching the same accuracy on real time data in the future.https://user-images.githubusercontent.com/91852182/147298831-225740f9-6903-4570-8336-0c9f16676456.png" alt="6">
Udacity released an open source simulator for self-driving cars to depict a real-time environment. The challenge is to mimic the driving behavior of a human on the simulator with the help of a model trained by deep neural networks. The concept is called Behavioral Cloning, to mimic how a human drives. The simulator contains two tracks and two modes, namely, training mode and autonomous mode. The dataset is generated from the simulator by the user, driving the car in training mode. This dataset is also known as the “good” driving data. This is followed by testing on the track, seeing how the deep learning model performs after being trained by that user data.
https://user-images.githubusercontent.com/91852182/147298261-4d57a5c1-1fda-4654-9741-2f284e6d0479.png" alt="1">
The problem is solved in the following steps:
Technologies that are used in the implementation of this project and the motivation behind using these are described in this section.
TensorFlow: This an open-source library for dataflow programming. It is widely used for machine learning applications. It is also used as both a math library and for large computation. For this project Keras, a high-level API that uses TensorFlow as the backend is used. Keras facilitate in building the models easily as it more user friendly.
Different libraries are available in Python that helps in machine learning projects. Several of those libraries have improved the performance of this project. Few of them are mentioned in this section. First, “Numpy” that provides with high-level math function collection to support multi-dimensional metrices and arrays. This is used for faster computations over the weights (gradients) in neural networks. Second, “scikit-learn” is a machine learning library for Python which features different algorithms and Machine Learning function packages. Another one is OpenCV (Open Source Computer Vision Library) which is designed for computational efficiency with focus on real-time applications. In this project, OpenCV is used for image preprocessing and augmentation techniques.
The project makes use of Conda Environment which is an open source distribution for Python which simplifies package management and deployment. It is best for large scale data processing. The machine on which this project was built, is a personal computer.
CNN is a type of feed-forward neural network computing system that can be used to learn from input data. Learning is accomplished by determining a set of weights or filter values that allow the network to model the behavior according to the training data. The desired output and the output generated by CNN initialized with random weights will be different. This difference (generated error) is backpropagated through the layers of CNN to adjust the weights of the neurons, which in turn reduces the error and allows us produce output closer to the desired one.
CNN is good at capturing hierarchical and spatial data from images. It utilizes filters that look at regions of an input image with a defined window size and map it to some output. It then slides the window by some defined stride to other regions, covering the whole image. Each convolution filter layer thus captures the properties of this input image hierarchically in a series of subsequent layers, capturing the details like lines in image, then shapes, then whole objects in later layers. CNN can be a good fit to feed the images of a dataset and classify them into their respective classes.
Another type of layers sometimes used in deep learning networks is a Time- distributed layer. Time-Distributed layers are provided in Keras as wrapper layers. Every temporal slice of an input is applied with this wrapper layer. The requirement for input is that to be at least three-dimensional, first index can be considered as temporal dimension. These Time-Distributed can be applied to a dense layer to each of the timesteps, independently or even used with Convolutional Layers. The way they can be written is also simple in Keras as shown in Figure 1 and Figure 2.
https://user-images.githubusercontent.com/91852182/147298483-4f37a092-7e71-4ce6-9274-9a133d138a4c.png" alt="2">
Fig. 1: TimeDistributed Dense layer
https://user-images.githubusercontent.com/91852182/147298501-6459d968-a279-4140-9be3-2d3ea826d9f6.png" alt="3">
Fig. 2: TimeDistributed Convolution layer
We will first download the simulator to start our behavioural training process. Udacity has built a simulator for self-driving cars and made it open source for the enthusiasts, so they can work on something close to a real-time environment. It is built on Unity, the video game development platform. The simulator consists of a configurable resolution and controls setting and is very user friendly. The graphics and input configurations can be changed according to user preference and machine configuration as shown in Figure 3. The user pushes the “Play!” button to enter the simulator user interface. You can enter the Controls tab to explore the keyboard controls, quite similar to a racing game which can be seen in Figure 4.
https://user-images.githubusercontent.com/91852182/147298708-de15ebc5-2482-42f8-b2a2-8d3c59fceff4.png" alt=" 4">
Fig. 3: Configuration screen
https://user-images.githubusercontent.com/91852182/147298712-944e2c2d-e01d-459b-8a7d-3c5471bea179.png" alt="5">
Fig. 4: Controls Configuration
The first actual screen of the simulator can be seen in Figure 5 and its components are discussed below. The simulator involves two tracks. One of them can be considered as simple and another one as complex that can be evident in the screenshots attached in Figure 6 and Figure 7. The word “simple” here just means that it has fewer curvy tracks and is easier to drive on, refer Figure 6. The “complex” track has steep elevations, sharp turns, shadowed environment, and is tough to drive on, even by a user doing it manually. Please refer Figure 6. There are two modes for driving the car in the simulator: (1) Training mode and (2) Autonomous mode. The training mode gives you the option of recording your run and capturing the training dataset. The small red sign at the top right of the screen in the Figure 6 and 7 depicts the car is being driven in training mode. The autonomous mode can be used to test the models to see if it can drive on the track without human intervention. Also, if you try to press the controls to get the car back on track, it will immediately notify you that it shifted to manual controls. The mode screenshot can be as seen in Figure 8. Once we have mastered how the car driven controls in simulator using keyboard keys, then we get started with record button to collect data. We will save the data from it in a specified folder as you can see
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit: 'Fruit recognition from images using deep learning' by H. Muresan and M. Oltean (https://arxiv.org/abs/1712.00580)
Fruit 360 is a dataset with 90380 images of 131 fruits and vegetables (https://www.kaggle.com/moltean/fruits). Images are 100 pixel by 100 pixel and are RGB (color) images (3 values for each pixel). This dataset is a subset of Fruit 360 dataset, containing only 10 fruits/vegetables (Strawberry, Apple_Red_Delicious, Pepper_Green, Corn, Banana, Tomato_1, Potato_White, Pineapple, Orange, and Peach). We selected a subset of fruits/vegetables, so the dataset size is smaller and the neural network can be trained faster.
The utilities used to create the dataset, along with step by step instructions, can be found here: https://github.com/kxk302/fruit_dataset_utilities
First, we created feature vectors for each image. Each image is 100 pixel by pixel and are RGB (color) images (3 values for each pixel). Hence, each image can be represented by 30,000 values (100 X 100 X 3). Second, we selected a subset of 10 fruits/vegetables images (training and test dataset sizes go from 7 GB and 2.5 GB for 131 fruits/vegetables to 500 MB and 177 MB for 10 fruits/vegetables, respectively). Third, we created separate files for feature vectors and labels. Finally, we mapped the labels for the 10 selected fruits/vegetables to a range of 0 to 9.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for training CNN built from aerial drone images of buildings in Hamburg
This dataset contains images extracted from aerial surveillance photos of the Speicherstadt and Kesselhaus buildings in Hamburg, provided by the City of Hamburg. Original 834 high resolution images (5472 x 3648 pixels) have been separated into smaller images (227 x 227 pixels) of the size that could be processed using SqueezeNet, a deep Convolutional Neural Network (CNN). This resulted in more than 350 thousand images that had to be subsequently processed automatically to retain images containing solely bricks and mortar and concrete. The final stage contained tedious manual/visual verification of images and their separation into positive (containing cracks) and negative (clear bricks and mortars) sets of images. The final set contains nearly 40 thousand images.
Since images extracted from Hamburg buildings contained only specific type of bricks and our intention was to extend the CNN to be able to deal with wider range of brick types as well as concrete surfaces, we added to our training set also images from the following Open Access databases (note that such images required resizing to 227 x 227 pixel size before use):
Such a combined data set resulted in over 80 thousand of images.
Matlab WebApp Server application based on trained SqueezeNet CNN
The integrated database of images has been used to train the SqueezeNet CNN using a method proposed by Kenta Itakura in his article published on Matlab Central: Classify crack image using deep learning and explain "WHY", which in turn is based on the work of Lei Zhang reported in his IEEE article: Road crack detection using deep convolutional neural network published at 2016 IEEE International Conference on Image Processing (ICIP).
The "Matlab" subfolder contains the complete software to allow building the application to run under Matlab WebApps Server. The provided version of the "netTransfer.mat" file has been compiled for Matlab revision 2020b, but it should also work when compiled for other revisions from 2019a onwards. BTW, the original location of the files was "D:\Cracks (2-class)\". For instructions how to use the provided Matlab files, refer to Matlab instructions at MATLAB Web App Server and Get Started with MATLAB Web App Server.
After producing and uploading the application to the Matlab WebApps Server, the application can be found at http://localhost:9988/webapps/home/ if deployed locally. It can be also deployed on a WEB server, subject to installation of the compliant Matlab Runtime package on the custom server, whcih can be found at MATLAB Runtimes (mathworks.com).
The important function included in the package is "unscramble.m", which corrects the error that exists in all known revisions of Matlab in uploading images selected by open file function in the Matlab App Designer. The effect is that image is "scrambled beyond recognition" after uploading to the Matlab WebApps Server. Our function de-scrambles such images, converting them into their original form.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Learning powerful discriminative features is the key for machine fault diagnosis. Most existing methods based on convolutional neural network (CNN) have achieved promising results. However, they primarily focus on global features derived from sample signals and fail to explicitly mine relationships between signals. In contrast, graph convolutional network (GCN) is able to efficiently mine data relationships by taking graph data with topological structure as input, making them highly effective for feature representation in non-Euclidean space. In this article, to make good use of the advantages of CNN and GCN, we propose a graph attentional convolutional neural network (GACNN) for effective intelligent fault diagnosis, which includes two subnetworks of fully CNN and GCN to extract the multilevel features information, and uses Efficient Channel Attention (ECA) attention mechanism to reduce information loss. Extensive experiments on three datasets show that our framework improves the representation ability of features and fault diagnosis performance, and achieves competitive accuracy against other approaches. And the results show that GACNN can achieve superior performance even under a strong background noise environment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Patellofemoral pain syndrome (PFPS) is a common disease of the knee. Despite its high incidence rate, its specific cause remains unclear. The artificial neural network model can be used for computer-aided diagnosis. Traditional diagnostic methods usually only consider a single factor. However, PFPS involves different biomechanical characteristics of the lower limbs. Thus, multiple biomechanical characteristics must be considered in the neural network model. The data distribution between different characteristic dimensions is different. Thus, preprocessing is necessary to make the different characteristic dimensions comparable. However, a general rule to follow in the selection of biomechanical data preprocessing methods is lacking, and different preprocessing methods have their own advantages and disadvantages. Therefore, this paper proposes a multi-input convolutional neural network (MI-CNN) method that uses two input channels to mine the information of lower limb biomechanical data from two mainstream data preprocessing methods (standardization and normalization) to diagnose PFPS. Data were augmented by horizontally flipping the multi-dimensional time-series signal to prevent network overfitting and improve model accuracy. The proposed method was tested on the walking and running datasets of 41 subjects (26 patients with PFPS and 15 pain-free controls). Three joint angles of the lower limbs and surface electromyography signals of seven muscles around the knee joint were used as input. MI-CNN was used to automatically extract features to classify patients with PFPS and pain-free controls. Compared with the traditional single-input convolutional neural network (SI-CNN) model and previous methods, the proposed MI-CNN method achieved a higher detection sensitivity of 97.6%, a specificity of 76.0%, and an accuracy of 89.0% on the running dataset. The accuracy of SI-CNN in the running dataset was about 82.5%. The results prove that combining the appropriate neural network model and biomechanical analysis can establish an accurate, convenient, and real-time auxiliary diagnosis system for PFPS to prevent misdiagnosis.
The hilarious mixture of wittiness, slapstick and action is all set to visit us again, this time more darker and bizarre! So why leave the fun for watching when we can do much more (with enough data, of course!)? This image dataset contains images of popular characters categorized which can be used for classification or image generation.
The dataset contains 5 categories of the show's characters- Rick, Morty, Poopybutthole, Summer and Meeseeks.
I initially thought of using this data but images were way less to generate some kind of result and I felt there was lot noise in accordance with it's size. So, I decided to add on few images and clean the data little bit more. I also tried to balance data as much as possible.
I was trying to learn CNN so I thought why not mix it with one of the shows that I love watching! Check out my model here-https://github.com/Parvv/Rick-and-Morty
I do a lot of work with image data sets. Often it is necessary to partition the images into male and female data sets. Doing this by hand can be a long and tedious task particularly on large data sets. So I decided to create a classifier that could do the task for me.
I used the CELEBA aligned data set to provide the images. I went through and separated the images visually into 1747 female and 1747 male training images. I also created 100 male and 100 female test image and 100 male, 100 female validation images. I want to only the face to be in the image so I developed an image cropping function using MTCNN to crop all the images. That function is included as one of the notebooks should anyone have a need for a good face cropping function. I also created an image duplicate detector to try to eliminate any of the training images from appearing in the test or validation images. I have developed a general purpose image classification function that works very well for most image classification tasks. It contains the option to select 1 of 7 models for use. For this application I used the MobileNet model because it is less computationally expensive and gives excellent results. On the test set accuracy is near 100%.
The CELEBA aligned data set was used. This data set is very large and of good quality. To crop the images to only include the face I developed a face cropping function using MTCNN. MTCNN is a very accurate program and is reasonably fast, however it is notflawless so after cropping the iages you shouldalways visually inspect the results.
I developed this data set to train a classifier to be able to distinguish the gender shown in an image. Why bother you may ask I can just look at the image and tell. True but lets say you have a data set of 50,000 images that you want to separate it into male and female data sets. Doing that by hand would take forever. With the trained classifier with near 100% accuracy you can use the classifier with model.predict to do the job for you.
Recent salient object detection (SOD) methods based on deep neural network have achieved remarkable performance. However, most of existing SOD models designed for low-resolution input perform poorly on high-resolution images due to the contradiction between the sampling depth and the receptive field size. Aiming at resolving this contradiction, we propose a novel one-stage framework called Pyramid Grafting Network (PGNet), using transformer and CNN backbone to extract features from different resolution images independently and then graft the features from transformer branch to CNN branch. An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically, guided by different source feature during decoding process. Moreover, we design an Attention Guided Loss (AGL) to explicitly supervise the attention matrix generated by CMGM to help the network better interact with the attention from different models. We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions. To our knowledge, it is the largest dataset in both quantity and resolution for high-resolution SOD task, which can be used for training and testing in future research. Sufficient experiments on UHRSD and widely-used SOD datasets demonstrate that our method achieves superior performance compared to the state-of-the-art methods.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for CNN Dailymail Dataset
Dataset Summary
The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.
Supported Tasks and Leaderboards
'summarization': Versions… See the full description on the dataset page: https://huggingface.co/datasets/abisee/cnn_dailymail.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These representations predate Zoobot 2.0 - you may find better performance with those more recent models. See the Zoobot github repository and HuggingFace.
Image representations are lower-dimensional summaries convenient for machine learning searches, predictions, clustering, etc.
This archive includes representations of galaxy images for subsets of DECaLS DR5 and SDSS. It also includes some further data useful for reproducing a series of practical experiments using those representations (see W+22, bottom of this page).
Representations
The representations are calculated with a CNN trained to predict volunteer answers to Galaxy Zoo DECaLS questions with the code "Zoobot", introduced in W+21 (bottom of this page). The weights of this CNN are available via the Zoobot github repository, currently under the checkpoint folder data/pretrained_models/decals_dr_trained_on_all_labelled_m0. See W+21 for details.
The most significant file is "cnn_features_decals.parquet". This file contains the representations calculated for the approx. 340k GZ DECaLS galaxies. See W+21 for a description of GZD-5. Galaxies can be crossmatched to other catalogues (e.g. the GZ DECaLS catalogue) by iauname.
"cnn_features_gz2.parquet" is the representations calculated by the *same* model, i.e. without retraining on labelled SDSS GZ2 images, for the approx 240k images classifed in Galaxy Zoo 2 (Willet 2013). These are still fairly good (see W+22), implying the CNN can sometimes generalise well to slightly different surveys. However, they could likely be improved by using a model trained on GZ2 directly. The Zoobot code makes this straightforward. The galaxies can be cross-matched to the Galaxy Zoo 2 catalogues on the "id_str" column, which is equal to the GZ2 objid (e.g. "588018090547020096").
Confused about .parquet? Think of it as a csv that's very fast to load. Load them like so:
import pandas as pd
df = pd.read_parquet(parquet_loc)
You might like to check zoobot.readthedocs.io for guidance on the CNN weights and a pair of ring galaxy catalogues.
References
Please cite one or both of these papers if you use this dataset. The labels and trained model come from W+21, while the representations were created in W+22.
W+21: https://arxiv.org/abs/2102.08414, Galaxy Zoo DECaLS: Detailed Visual Morphology Measurements from Volunteers and Deep Learning for 314,000 Galaxies
W+22: https://arxiv.org/abs/2110.12735, Practical Morphology Tools from Deep Supervised Representation Learning
The Fish Detection AI project aims to improve the efficiency of fish monitoring around marine energy facilities to comply with regulatory requirements. Despite advancements in computer vision, there is limited focus on sonar images, identifying small fish with unlabeled data, and methods for underwater fish monitoring for marine energy. A Faster R-CNN (Region-based Convolutional Neural Network) was developed using sonar images from Alaska Fish and Games to identify, track, and count fish in underwater environments. Supervised methods were used with Faster R-CNN to detect fish based on training using labeled data of fish. Customized filters were specifically applied to detect and count small fish when labeled datasets were unavailable. Unsupervised Domain Adaptation techniques were implemented to enable trained models to be applied to different unseen datasets, reducing the need for labeling datasets and training new models for various locations. Additionally, elastic shape analysis (ESA), hyper-image analysis, and various image preprocessing methods were explored to enhance fish detection. In this research we achieved: 1. Faster R-CNN for Sonar images - Applied Faster R-CNN reached > 0.85 average precision (AP) for large fish detection, providing robust results for higher-quality sonar images. - Integrated Norfair tracking to reduce double-counting of fish across video frames, enabling more accurate population estimates. 2. Small Fish Identification - Established customized filtering methods for small, often unlabeled fish in noisy acoustic images. This submission of data includes several sub-directories: - FryCounting: contains information on how to count small fish (i.e., fry) in the sonar image data - SG_aldi_addons: contains additions to the ALDI code (SG = Steven Gutstein, primary author) such as the trained models used in this experiment, which should match the models achieved when the training instructions are followed, and code for how to make the sonar images into movies - Summaries_Dir: contains information on how to set up the foundation to perform these experiments, such as installing all required packages and versions, and creating the PyTorch and ALDI environments These experiments boil down to a 2-part structure as described in the uploaded readme file: Part I: Installing and Using ALDI & Norfair Code - This is used for tracking and counting fish, and is a replication of the article that is linked, namely the Align and Distill (Aldi) work done by Justin Kay and others - This part relates to the Summaries_Dir subfolder, and the SG_aldi_addons sub-folder Part II: Installing and Using Fry Code - This is used to track and count smaller fish (aka fry) - This relates to the FryCounting sub-directory Also included here are links to the downloadable sonar data and the article that was replicated in this study.
The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses. More details about the data set can be found in our paper, Word Spotting in the Wild [1]. For our up-to-date benchmarks on this data, see our paper, End-to-end Scene Text Recognition [2].
This dataset only has word-level annotations (no character bounding boxes) and should be used for:
Downloaded from http://www.iapr-tc11.org/mediawiki/index.php?title=The_Street_View_Text_Dataset
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Image quality assessment (IQA) is extremely important in computed tomography (CT) imaging, since it facilitates the optimization of radiation dose and the development of novel algorithms in medical imaging, such as restoration. In addition, since an excessive dose of radiation can cause harmful effects in patients, generating high- quality images from low-dose images is a popular topic in the medical domain. However, even though peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) are the most widely used evaluation metrics for these algorithms, their correlation with radiologists’ opinion of the image quality has been proven to be insufficient in previous studies, since they calculate the image score based on numeric pixel values (1-3). In addition, the need for pristine reference images to calculate these metrics makes them ineffective in real clinical environments, considering that pristine, high-quality images are often impossible to obtain due to the risk posed to patients as a result of radiation dosage. To overcome these limitations, several studies have aimed to develop a no-reference novel image quality metric that correlates well with radiologists’ opinion on image quality without any reference images (2, 4, 5).
Nevertheless, due to the lack of open-source datasets specifically for CT IQA, experiments have been conducted with datasets that differ from each other, rendering their results incomparable and introducing difficulties in determining a standard image quality metric for CT imaging. Besides, unlike real low-dose CT images with quality degradation due to various combinations of artifacts, most studies are conducted with only one type of artifact (e.g., low-dose noise (6-11), view aliasing (12), metal artifacts (13), scattering (14-16), motion artifacts (17-22), etc.). Therefore, this challenge aims to 1) evaluate various NR-IQA models on CT images containing complex noise/artifacts, 2) to compare their correlations with scores produced by radiologists, and 3) to grant insights into the determination of the best-performing metric of CT imaging in terms of correlating with the perception of radiologists’.
Furthermore, considering that low-dose CT images are achieved by reducing the number of projections per rotation and by reducing the X-ray current, the combination of two major artifacts, namely the sparse view streak and noise generated by these methods, is dealt with in this challenge so that the best-performing IQA model applicable in real clinical environments can be verified.
Funding Declaration:
This research was partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.RS-2022-00155966, Artificial Intelligence Convergence Innovation Human Resources Development (Ewha Womans University)), and by the National Research Foundation of Korea (NRF-2022R1A2C1092072), and by the Korea Medical Device Development Fund grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 1711174276, RS-2020-KD000016).
References:
Lee W, Cho E, Kim W, Choi J-H. Performance evaluation of image quality metrics for perceptual assessment of low-dose computed tomography images. Medical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment: SPIE, 2022.
Lee W, Cho E, Kim W, Choi H, Beck KS, Yoon HJ, Baek J, Choi J-H. No-reference perceptual CT image quality assessment based on a self-supervised learning framework. Machine Learning: Science and Technology 2022.
Choi D, Kim W, Lee J, Han M, Baek J, Choi J-H. Integration of 2D iteration and a 3D CNN-based model for multi-type artifact suppression in C-arm cone-beam CT. Machine Vision and Applications 2021;32(116):1-14.
Pal D, Patel B, Wang A. SSIQA: Multi-task learning for non-reference CT image quality assessment with self-supervised noise level prediction. 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI): IEEE, 2021; p. 1962-1965.
Mittal A, Moorthy AK, Bovik AC. No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 2012;21(12):4695-4708. doi: 10.1109/TIP.2012.2214050
Lee J-YK, Wonjin; Lee, Yebin; Lee, Ji-Yeon; Ko, Eunji; Choi, Jang-Hwan. Unsupervised Domain Adaptation for Low-dose Computed Tomography Denoising. IEEE Access 2022.
Jeon S-Y, Kim W, Choi J-H. MM-Net: Multi-frame and Multi-mask-based Unsupervised Deep Denoising for Low-dose Computed Tomography. IEEE Transactions on Radiation and Plasma Medical Sciences 2022.
Kim W, Lee J, Kang M, Kim JS, Choi J-H. Wavelet subband-specific learning for low-dose computed tomography denoising. PloS one 2022;17(9):e0274308.
Han M, Shim H, Baek J. Low-dose CT denoising via convolutional neural network with an observer loss function. Med Phys 2021;48(10):5727-5742. doi: 10.1002/mp.15161
Kim B, Shim H, Baek J. Weakly-supervised progressive denoising with unpaired CT images. Med Image Anal 2021;71:102065. doi: 10.1016/j.media.2021.102065
Wagner F, Thies M, Gu M, Huang Y, Pechmann S, Patwari M, Ploner S, Aust O, Uderhardt S, Schett G, Christiansen S, Maier A. Ultralow-parameter denoising: Trainable bilateral filter layers in computed tomography. Med Phys 2022;49(8):5107-5120. doi: 10.1002/mp.15718
Kim B, Shim H, Baek J. A streak artifact reduction algorithm in sparse-view CT using a self-supervised neural representation. Med Phys 2022. doi: 10.1002/mp.15885
Kim S, Ahn J, Kim B, Kim C, Baek J. Convolutional neural network-based metal and streak artifacts reduction in dental CT images with sparse-view sampling scheme. Med Phys 2022;49(9):6253-6277. doi: 10.1002/mp.15884
Bier B, Berger M, Maier A, Kachelrieß M, Ritschl L, Müller K, Choi JH, Fahrig R. Scatter correction using a primary modulator on a clinical angiography Carm CT system. Med Phys 2017;44(9):e125-e137.
Maul N, Roser P, Birkhold A, Kowarschik M, Zhong X, Strobel N, Maier A. Learning-based occupational x-ray scatter estimation. Phys Med Biol 2022;67(7). doi: 10.1088/1361-6560/ac58dc
Roser P, Birkhold A, Preuhs A, Syben C, Felsner L, Hoppe E, Strobel N, Kowarschik M, Fahrig R, Maier A. X-Ray Scatter Estimation Using Deep Splines. IEEE Trans Med Imaging 2021;40(9):2272-2283. doi: 10.1109/TMI.2021.3074712
Maier J, Nitschke M, Choi JH, Gold G, Fahrig R, Eskofier BM, Maier A. Rigid and Non-Rigid Motion Compensation in Weight-Bearing CBCT of the Knee Using Simulated Inertial Measurements. IEEE Trans Biomed Eng 2022;69(5):1608-1619. doi: 10.1109/TBME.2021.3123673
Choi JH, Maier A, Keil A, Pal S, McWalter EJ, Beaupré GS, Gold GE, Fahrig R. Fiducial markerbased correction for involuntary motion in weightbearing Carm CT scanning of knees. II. Experiment. Med Phys 2014;41(6Part1):061902.
Choi JH, Fahrig R, Keil A, Besier TF, Pal S, McWalter EJ, Beaupré GS, Maier A. Fiducial markerbased correction for involuntary motion in weightbearing Carm CT scanning of knees. Part I. Numerical modelbased optimization. Med Phys 2013;40(9):091905.
Berger M, Muller K, Aichert A, Unberath M, Thies J, Choi JH, Fahrig R, Maier A. Marker-free motion correction in weight-bearing cone-beam CT of the knee joint. Med Phys 2016;43(3):1235-1248. doi: 10.1118/1.4941012
Ko Y, Moon S, Baek J, Shim H. Rigid and non-rigid motion artifact reduction in X-ray CT using attention module. Med Image Anal 2021;67:101883. doi: 10.1016/j.media.2020.101883
Preuhs A, Manhart M, Roser P, Hoppe E, Huang Y, Psychogios M, Kowarschik M, Maier A. Appearance Learning for Image-Based Motion Estimation in Tomography. IEEE Trans Med Imaging 2020;39(11):3667-3678. doi: 10.1109/TMI.2020.3002695
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
pszemraj/boulderspot
These are aerial images of Switzerland classified into what could be a bouldering area (label: bouldering_area) or not (label: other). The test set has no labels (i.e. the column is None) and is randomly sampled from across the country. Sources:
data: SWISSIMAGE 10 cm labels: me
Date created: 2021 You can find some example CNN-based models trained on an earlier/smaller version of this dataset in this repo If you are a member of an organization interested in… See the full description on the dataset page: https://huggingface.co/datasets/pszemraj/boulderspot.
Data-driven deep learning models are emerging as a promising method for characterizing pore-scale flow through complex porous media while requiring minimal computational power. However, previous models often require extensive computation to simulate flow through synthetic porous media for use as training data. We propose a convolutional neural network trained solely on periodic unit cells to predict pore-scale velocity fields of complex heterogeneous porous media from binary images without the need for further image processing. Our model is trained using a range of simple and complex unit cells that can be obtained analytically or numerically at a low computational cost. Our results show that the model accurately predicts the permeability and pore-scale flow characteristics of synthetic porous media and real reticulated foams. We significantly improve the convergence of numerical simulations by using the predictions from our model as initial guesses. Our approach addresses the limitatio..., ,
Abstract
The data collection process consisted of continuously recording during one month a group of Guinea baboons living in semi-liberty at the CNRS primatology center in Rousset-sur-Arc (France). Two microphones we placed nearby their enclosure to continuously record the sounds produced by the group. A convolutional neural network (CNN) was used on these large and noisy audio recordings to automatically extract segments of sound containing a baboon vocal production by following the method of Bonafos et al. (2023). The resulting dataset consists of one-second to several-minute wav files of automatically detected vocalizations segments. The dataset thus provides a wide range of baboon vocalizations produced at all times of the day. It can be used to study vocal productions of non-human primates, their repertoire, their distribution over the day, their frequency, and their heterogeneity. In addition to the analysis of animal communication, the dataset can also be used as a learning base for sound classification models.
Data acquisition
The data are audio recordings of baboons. The recordings were made with a H6 Zoom recorder, using the included XYH-6 stereo microphone. The sample size is 44100 Hertz, 16 bits. The microphones were placed in the vicinity of the enclosure for one month and recorded continuously on a PC computer. A CNN passed over the data with a sliding window of 1 second and an overlap of 80% to detect the vocal productions of the baboons. The dataset consists of the segments predicted by the CNN to contain a baboon vocalization. Windows containing signal less than one second apart were merged into a single vocalization.
Data source location
Institution: CNRS, Primate Facility
City/Town/Region: Rousset-sur-Arc
Country: France
Latitude and longitude for collected samples/data: 43.47033535251509, 5.6514732876668905
Value of the data
This dataset is relatively unique in terms of the quantity of vocalizations available.
This massive dataset can be very useful to two types of scientific communities: experts in primatology who study the vocal productions of non-human primates, and experts in data science and audio signal processing.
The machine learning research community has at its disposal a database of several dozen hours of animal vocalizations, which will make it possible to build up a large learning base, very useful for Environemental Sound Recognition tasks, for example.
Objective
This dataset is a follow-up of two studies on the vocal productions of Guinea baboons (Papio papio) in which we carried out analyses of their vocal productions on the basis of a relatively large vocalization sample containing around 1300 vocalizations (Boë, Berthommier, Legou, Captier, Kemp, Sawallis, Becker, Rey, & Fagot, 2017; Kemp, Rey, Legou, Boë, Berthommier, Becker, & Fagot, 2017). The aim was to collect a larger database using the technique of deep convolutional neural networks in order to 1) automatically detect vocal productions in a large continuous audio recording and 2) perform a categorization of these vocalizations on a more massive sample. A description of the pipeline that enabled these automatic detections and categorizations is given in Bonafos, Pudlo, Freyermuth, Legou, Fagot, Tronçon, & Rey (2023).
Data description
The data is a set of audio files in wav format. They are at least one second long (the size of the window), up to several minutes, if several windows are consecutively predicted as containing signal. Moreover, we add the labeled data we used to train the CNN which did the prediction. We also provide two hours of the continuous recordings to have an idea of the continuous recordings and test the code of the paper provided on gitlab.
In addition, there is a database in csv format listing all the vocalizations, the day and time of their production, and the prediction probabilities of the model.
Experimental design, materials and methods
The original recordings represent one month of continuous audio recording. Seven hours of this month were manually labelled. They were segmented and labelled according to whether or not there was a monkey vocalization (i.e., noise or vocalization) and, if there was a vocalization, according to the type of vocalization (6 possible classes: bark, copulation grunt, grunt, scream, yak, wahoo). These manually labelled data were used as a training set for a CNN, which was automatically trained following the pipeline of Bonafos et al. (2023). This model was then used to automatically detect and classify vocalization during the whole month of audio recording. It processes the data in the same way when predicting new data as it does when training. It uses a sliding window of one second with an overlap of 80%. It does not take into account information from previous predictions, but calculates the probability of a vocalization in each one-second window independently. It then iterates through the month. For each window, the model predicts two outputs: the probability that there is a vocalization and the probability of each class of vocalization.
For the purpose of generating the wav files, if a window has a probability of a vocalization greater than 0.5, it is considered to contain a vocalization. If it is the first one, a vocalization is started at that moment. If the time windows that follow a vocalization also contain a vocalization, then the signal they contain is added to the first segment for which a vocalization has been detected. As soon as a one-second segment no longer contains a signal corresponding to a vocalization, the wav file is closed. If windows are predicted to contain no vocalizations, but are between two windows that contain vocalizations within 1 second of each other, then all windows are merged.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a curated benchmark collection of 1,062 labelled lung ultrasound (LUS) images collected from patients at Mulago National Referral Hospital and Kiruddu Referral Hospital in Kampala, Uganda. The images were acquired and annotated by senior radiologists to support the development and evaluation of artificial intelligence (AI) models for pulmonary disease diagnosis. Each image is categorized into one of three classes: Probably COVID-19 (COVID-19), Diseased Lung but Probably Not COVID-19 (Other Lung Disease), and Healthy Lung.
The dataset addresses key challenges in LUS interpretation, including inter-operator variability, low signal-to-noise ratios, and reliance on expert sonographers. It is particularly suitable for training and testing convolutional neural network (CNN)-based models for medical image classification tasks in low-resource settings. The images are provided in standard formats such as PNG or JPEG, with corresponding labels stored in structured files like CSV or JSON to facilitate ease of use in machine learning workflows.
In this second version of the dataset, we have extended the resource by including a folder containing the original unprocessed raw data, as well as the scripts used to process, clean, and sort the data into the final labelled set. These additions promote transparency and reproducibility, allowing researchers to understand the full data pipeline and adapt it for their own applications. This resource is intended to advance research in deep learning for lung ultrasound analysis and to contribute toward building more accessible and reliable diagnostic tools in global health.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The global classification dataset of daytime and nighttime marine low-cloud mesoscale morphology with six cloud types (Solid stratus, Closed MCC, Open MCC, Disorganized MCC, Clustered Cu and Suppressed Cu). The spatial resolution is 1o × 1o and the temporal resolution is 5 minutes for the years 2018-2022. They were established based on a deep learning model ResNet-50. Trained on daytime radiance data from MODIS (Moderate Resolution Imaging Spectroradiometer) and daytime retrieved COT (Cloud Optical Thickness), this model achieved a high prediction accuracy and can be applied to nighttime cloud classification. For a detailed introduction to the model, please refer to our article.
Product information
Training, Validation, Test dataset
Accurate monitoring of air quality can reduce its adverse impact on earth. Ground-level sensors can provide fine particulate matter (PM2.5) concentrations and ground images. But, such sensors have limited spatial coverage and require deployment cost. PM2.5 can be estimated from satellite-retrieved Aerosol Optical Depth (AOD) too. However, AOD is subject to uncertainties associated with its retrieval algorithms and constrain the spatial resolution of estimated PM2.5. AOD is not retrievable under cloudy weather as well. In contrast, satellite images provide continuous spatial coverage with no separate deployment cost. Accuracy of monitoring from such satellite images is hindered due to uncertainties of sensor data of relevant environmental parameters, such as, relative humidity, temperature, wind speed and wind direction . Belief Rule Based Expert System (BRBES) is an efficient algorithm to address these uncertainties. Convolutional Neural Network (CNN) is suitable for image analytics. Hence, we propose a novel model by integrating CNN with BRBES to monitor air quality from satellite images with improved accuracy. We customized CNN and optimized BRBES to increase monitoring accuracy further. An obscure image has been differentiated between polluted air and cloud based on relationship of PM2.5 with relative humidity. Valid environmental data (temperature, wind speed and wind direction) have been adopted to further strengthen the monitoring performance of our proposed model. Three-year observation data (satellite images and environmental parameters) from 2014 to 2016 of Shanghai have been employed to analyze and design our proposed model.
Source code and dataset
We implement our proposed integrated algorithm with Python 3 and C++ programming language. We process the satellite images with OpenCV library. Keras library functions are used to implement our customized VGG Net. We write python script smallervggnet.py to build this VGG Net. Next, we train and test this network with a dataset of satellite images through train.py script. This dataset consists of 3-year satellite images of Oriental Pearl Tower, Shanghai, China from Planet from January-2014 till December-2016 (Planet Team, 2017). These images are captured by PlanetScope, which is a constellation composed by approximately 120 optical satellites operated by Planet (Planet Team, San Francisco, CA, USA, 2016). Based on the level of PM2.5, this dataset is divided into three classes: HighPM, MediumPM and LowPM. We classify a new satellite image (201612230949.png) with our trained VGG Net by classify.py script. Standard file I/O is used to feed this classification output to the first BRBES (cnn_brb_1.cpp) through a text file (cnn_prediction.txt). In addition to VGG Net classification output, cloud percentage and relative humidity are fed as input to first BRBES. We write cnn_brb_2.cpp to implement second BRBES, which takes the output of first BRBES, temperature and wind speed as its input. Wind direction based recalculation of the output of second BRBES is also performed in this cpp file to compute the final monitoring value of PM2.5. We demonstrate this code architecture through a flow chart in Figure 5 of the manuscript.Source code and dataset of the satellite images are made freely available through the published compute capsule (https://doi.org/10.24433/CO.8230207.v1).
Code: MIT license; Data: No Rights Reserved (CC0)
The dataset was originally published in DiVA and moved to SND in 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classification performance based on ImageNet is the de-facto standard metric for CNN development. In this work we challenge the notion that CNN architecture design solely based on ImageNet leads to generally effective convolutional neural network (CNN) architectures that perform well on a diverse set of datasets and application domains. To this end, we investigate and ultimately improve ImageNet as a basis for deriving such architectures. We conduct an extensive empirical study for which we train 500 CNN architectures, sampled from the broad AnyNetX design space, on ImageNet as well as 8 additional well-known image classification benchmark datasets from a diverse array of application domains. We observe that the performances of the architectures are highly dataset dependent. Some datasets even exhibit a negative error correlation with ImageNet across all architectures. We show how to significantly increase these correlations by utilizing ImageNet subsets restricted to fewer classes. These contributions can have a profound impact on the way we design future CNN architectures and help alleviate the tilt we see currently in our community with respect to over-reliance on one dataset.