Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Training image dataset used in the manuscript "Visualizing histopathologic deep learning classification and anomaly detection using nonlinear feature space dimensionality reduction"
Facebook
TwitterAstroSpectra-MNIST is a novel dataset designed to benchmark machine learning models on astronomical spectral classification tasks. We provide a lightweight, easily storable, and processable dataset.Through a series of processes including data preprocessing and normalization, the astronomical spectral data from LAMOST are converted into lightweight grayscale images in the format of 28*28 pixels. AstroSpectra-MNIST maintains the same image structure as MNIST but differs in storage format. It is characterized by its small size, ease of storage and accessibility. It includes two versions: AstroSpectra-MNIST-v1 and AstroSpectra-MNIST-v2. v1 includes three categories: Star, Galaxy, and QSO, which are labeled with the numbers 1, 2, and 3, respectively. v2 covers three subcategories of stars, namely F-type, G-type, and K-type, which are also labeled with the numbers 1, 2, and 3.
AstroSpectra-MNIST/ ├── AstroSpectra-MNIST-v1/ │ ├── train_imagesv1/ │ ├── test_imagesv1/ │ ├── train_labelsv1.csv │ └── test_labelsv1.csv ├── AstroSpectra-MNIST-v2/ │ ├── train_imagesv2/ │ ├── test_imagesv2/ │ ├── train_labelsv2.csv │ └── test_labelsv2.csv └── README.md
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27013442%2Faf12f069bd2fa5a08bd448352343d0b5%2FIMG_202508013580_568x169.png?generation=1754018243177992&alt=media" alt="">
| File Name | Description | Format | Size |
|---|---|---|---|
train_imagesv1/ | 8000 grayscale images | PNG | 4.42 MB |
test_imagesv1/ | 1000 grayscale images | PNG | 564 KB |
train_labelsv1.csv | Labels for v1 training set | CSV | 131 KB |
test_labelsv1.csv | Labels for v1 test set | CSV | 16.5 KB |
| File Name | Description | Format | Size |
|---|---|---|---|
train_imagesv2/ | 42454 grayscale images | PNG | 25.9 MB |
test_imagesv2/ | 7546 grayscale images | PNG | 4.56 MB |
train_labelsv2.csv | Labels for v2 training set | CSV | 776 KB |
test_labelsv2.csv | Labels for v2 test set | CSV | 124 KB |
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27013442%2F793bfcc0e1f1b4204cd6960a9c3d9de2%2Fb4cd842b5059ed8b7c405d1d6891029.png?generation=1754015363119559&alt=media" alt="AstroSpectra-MNIST Sample">
-**AstroSpectra-MNIST-v1**: PCA shows partial split among Star, Galaxy, and QSO classes, suggesting distinguishable spectral patterns. t-SNE offers clearer class separation and reveals subclusters within the stellar class, indicating internal heterogeneity and supporting the need for secondary classification.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27013442%2Fa1fde7f039b9d6c0605bb5acd66b05af%2Fd7c16f68e5ee1d39f302c5671d2c5af.png?generation=1754015398742400&alt=media" alt="AstroSpectra-MNIST Sample">
-**AstroSpectra-MNIST-v2**: PCA shows significant overlap among F, G,and K subclasses, highlighting classification difficulty. t-SNE improves separation and shows discernible subclass clusters despite some overlap, demonstrating its advantage in visualizing complex spectroscopic data.
Classical models (e.g., AlexNet) are adapted by converting input channels to grayscale (single-channel), adjusting the output layer to 3 classes, and standardizing images through upsampling to 224×224 resolution. Additionally, we created two custom CNN models, SimpleCNN1 and SimpleCNN2, each processing 28×28 grayscale images through dual convolution pool blocks and fully connected layers to output three class predictions.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27013442%2Feda312eb5756e823880b0d5000b3240c%2FIMG_202508016628_576x301.png?generation=1754035851976237&alt=media" alt="">
We obtain 3601-dimensional raw spectrum vectors from LAMOST DR1 and compress them into 721-dimensional feature vectors using a mean filter. This new dataset is named LineAstroSpectra and includes two versions: LineAstroSpectra-v1 and LineAstroSpectra-v2. Models and hyperparameters identical to the AstroSpectra-MNIST benchmark were applied to LineAstroSpectra. The results provide a direct comparison between the raw spectral and the corresponding AstroSpectra-MNIST version. For machine l...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Training image dataset used in the manuscript "Visualizing histopathologic deep learning classification and anomaly detection using nonlinear feature space dimensionality reduction"