18 datasets found
  1. S

    Galaxy, star, quasar dataset

    • scidb.cn
    Updated Feb 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Xin (2023). Galaxy, star, quasar dataset [Dataset]. http://doi.org/10.57760/sciencedb.07177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2023
    Dataset provided by
    Science Data Bank
    Authors
    Li Xin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The data used in this paper is from the 16th issue of SDSS. SDSS-DR16 contains a total of 930,268 photometric images, with 1.2 billion observation sources and tens of millions of spectra. The data obtained in this paper is downloaded from the official website of SDSS. Specifically, the data is obtained through the SkyServerAPI structure by using SQL query statements in the subwebsite CasJobs. As the current SDSS photometric table PhotoObj can only classify all observed sources as point sources and surface sources, the target sources can be better classified as galaxies, stars and quasars through spectra. Therefore, we obtain calibrated sources in CasJobs by crossing SpecPhoto with the PhotoObj star list, and obtain target position information (right ascension and declination). Calibrated sources can tell them apart precisely and quickly. Each calibrated source is labeled with the parameter "Class" as "galaxy", "star", or "quasar". In this paper, observation day area 3462, 3478, 3530 and other 4 areas in SDSS-DR16 are selected as experimental data, because a large number of sources can be obtained in these areas to provide rich sample data for the experiment. For example, there are 9891 sources in the 3462-day area, including 2790 galactic sources, 2378 stellar sources and 4723 quasar sources. There are 3862 sources in the 3478 day area, including 1759 galactic sources, 577 stellar sources and 1526 quasar sources. FITS files are a commonly used data format in the astronomical community. By cross-matching the star list and FITS files in the local celestial region, we obtained images of 5 bands of u, g, r, i and z of 12499 galaxy sources, 16914 quasar sources and 16908 star sources as training and testing data.1.1 Image SynthesisSDSS photometric data includes photometric images of five bands u, g, r, i and z, and these photometric image data are respectively packaged in single-band format in FITS files. Images of different bands contain different information. Since the three bands g, r and i contain more feature information and less noise, Astronomical researchers typically use the g, r, and i bands corresponding to the R, G, and B channels of the image to synthesize photometric images. Generally, different bands cannot be directly synthesized. If three bands are directly synthesized, the image of different bands may not be aligned. Therefore, this paper adopts the RGB multi-band image synthesis software written by He Zhendong et al. to synthesize images in g, r and i bands. This method effectively avoids the problem that images in different bands cannot be aligned. The pixel of each photometry image in this paper is 2048×1489.1.2 Data tailoringThis paper first clipped the target image, image clipping can use image segmentation tools to solve this problem, this paper uses Python to achieve this process. In the process of clipping, we convert the right ascension and declination of the source in the star list into pixel coordinates on the photometric image through the coordinate conversion formula, and determine the specific position of the source through the pixel coordinates. The coordinates are regarded as the center point and clipping is carried out in the form of a rectangular box. We found that the input image size affects the experimental results. Therefore, according to the target size of the source, we selected three different cutting sizes, 40×40, 60×60 and 80×80 respectively. Through experiment and analysis, we find that convolutional neural network has better learning ability and higher accuracy for data with small image size. In the end, we chose to divide the surface source galaxies, point source quasars, and stars into 40×40 sizes.1.3 Division of training and test dataIn order to make the algorithm have more accurate recognition performance, we need enough image samples. The selection of training set, verification set and test set is an important factor affecting the final recognition accuracy. In this paper, the training set, verification set and test set are set according to the ratio of 8:1:1. The purpose of verification set is used to revise the algorithm, and the purpose of test set is used to evaluate the generalization ability of the final algorithm. Table 1 shows the specific data partitioning information. The total sample size is 34,000 source images, including 11543 galaxy sources, 11967 star sources, and 10490 quasar sources.1.4 Data preprocessingIn this experiment, the training set and test set can be used as the training and test input of the algorithm after data preprocessing. The data quantity and quality largely determine the recognition performance of the algorithm. The pre-processing of the training set and the test set are different. In the training set, we first perform vertical flip, horizontal flip and scale on the cropped image to enrich the data samples and enhance the generalization ability of the algorithm. Since the features in the celestial object source have the flip invariability, the labels of galaxies, stars and quasars will not change after rotation. In the test set, our preprocessing process is relatively simple compared with the training set. We carry out simple scaling processing on the input image and test input the obtained image.

  2. f

    UniverseMachine Data Release 1

    • arizona.figshare.com
    png
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Behroozi; Risa Wechsler; Andrew Hearin; Charlie Conroy (2023). UniverseMachine Data Release 1 [Dataset]. http://doi.org/10.25422/azu.data.12093972.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University of Arizona Research Data Repository
    Authors
    Peter Behroozi; Risa Wechsler; Andrew Hearin; Charlie Conroy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The UniverseMachine is a self-consistent empirical model of galaxy formation in dark matter halos. It is constrained via observed galaxy stellar mass functions, star formation rates, clustering, luminosity functions, and quenched fractions. This dataset includes derived constraints on galaxy-halo relationships, star formation histories, merger histories, and predicted observables.Full mock catalogs with galaxy properties are available here.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu

  3. Training data for 'Exome sequencing data analysis' tutorial (Galaxy Training...

    • zenodo.org
    bin
    Updated Aug 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wolfgang Maier; Wolfgang Maier (2022). Training data for 'Exome sequencing data analysis' tutorial (Galaxy Training Material) [Dataset]. http://doi.org/10.5281/zenodo.3054169
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wolfgang Maier; Wolfgang Maier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data used in this tutorial are a subset of the data published previously in Training material for the course "Exome analysis with GALAXY". Credit for uploading the original data goes to Paolo Uva and Gianmauro Cuccuru!

    Specifically, you may need the following datasets for following the tutorial:

    Raw sequencing reads

    Premapped sequencing reads

    Reference sequence (human chromosome 8)

    If you would just like to play with GEMINI rather than work through the full tutorial, you'll find below a prebuilt GEMINI database (for GEMINI version 0.20.1) for the family trio. You can start exploring this database without having to run GEMINI load and, in fact, without having to install GEMINI's bundled annotation data.

  4. Photometric Galaxy Redshift Prediction

    • zenodo.org
    csv
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    grigorios tsagkatakis; grigorios tsagkatakis (2024). Photometric Galaxy Redshift Prediction [Dataset]. http://doi.org/10.5281/zenodo.11073039
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    grigorios tsagkatakis; grigorios tsagkatakis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sloan Digital Sky Survey (SDSS) Galaxy Redshift Dataset

    This dataset comprises a curated collection of galaxy observations from the Sloan Digital Sky Survey (SDSS). It features photometric and spectroscopic data for 100 galaxies, specifically selected to cover a range of redshifts from 0 to 0.4. The dataset includes the following key parameters for each galaxy:

    • Photometric Data: Magnitudes in the SDSS 'u', 'g', 'r', 'i', and 'z' bands.
    • Spectroscopic Data: Measured redshift (redshift) and its error (redshift_error).
    • Additional Metadata:
      • objid: Unique identifier for the photometric object.
      • specObjID: Unique identifier for the spectroscopic object.
      • ra: Right ascension in decimal degrees.
      • dec: Declination in decimal degrees.
      • class: Classification of the object, all marked as 'GALAXY'.

    Purpose and Use

    This dataset is intended for use in astronomical research and education, particularly in studies involving galaxy properties and distribution, cosmology, and machine learning applications such as redshift prediction models. The data is well-suited for developing and testing predictive models that estimate redshifts from photometric data, aiding in the expansion of accessible astronomical analysis tools.

    Data Collection Method

    The data was extracted using SQL queries against the public SDSS DR16 database, ensuring accuracy and relevance in current astronomical research contexts.

    Accessibility

    The dataset is made available under a CC0 license to promote open scientific research and collaboration within the astronomical community and beyond.

  5. o

    Bulk RNA-Seq Deconvolution with single-cell RNA-Seq Datasets

    • explore.openaire.eu
    Updated Oct 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wendi Bacon; Mehmet Tekman (2021). Bulk RNA-Seq Deconvolution with single-cell RNA-Seq Datasets [Dataset]. http://doi.org/10.5281/zenodo.5719228
    Explore at:
    Dataset updated
    Oct 6, 2021
    Authors
    Wendi Bacon; Mehmet Tekman
    Description

    Bulk data of human pancreas The dataset from Fadista et al. (2014) contains raw read counts data from bulk RNA-seq of human pancreatic islets to study glucose metabolism in healthy and hyper-hypoglycemic conditions. For the purpose of this vignette, the dataset is pre-processed and made available on the data download page. In addition to read counts, this dataset also contains HbA1c levels, BMI, gender and age information for each subject. Single Cell Data of Human Pancreas The single cell data are from Segerstolpe et al. (2016), which constrains read counts for 25453 genes across 2209 cells. Here we only include the 1097 cells from 6 healthy subjects. The read counts are available on the data download page, in the form of an ExpressionSet. Another single cell data is from Xin et al. (2016), which have 39849 genes and 1492 cells. The read counts are available on the data download page, in the form of an ExpressionSet. The deconvolution of 89 subjects from Fadista et al. (2014) are preformed with bulk data GSE50244.bulk.eset and single cell reference EMTAB.eset. We constrained our estimation on 6 major cell types: alpha, beta, delta, gamma, acinar and ductal, which make up over 90% of the whole islet.

  6. Galaxy Zoo 2: Images

    • kaggle.com
    Updated Jan 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaime Trickz (2021). Galaxy Zoo 2: Images [Dataset]. https://www.kaggle.com/jaimetrickz/galaxy-zoo-2-images/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 26, 2021
    Dataset provided by
    Kaggle
    Authors
    Jaime Trickz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    The Galaxy Zoo team regularly receives requests for subject images for various versions of Galaxy Zoo, in order to facilitate other investigations, e.g. machine learning projects. This repository is an updated attempt to provide those in a way that is useful to the wider community.

    Content

    There are 243,434 images in total. This is off by about 0.08% from the total count in the tables - it's not clear what the cause of the discrepancy is

    The images are available in the file images_gz2.

    The most recent and reliable source for morphology measurements is "GZ2 - Table 1 - Normal-depth sample with new debiasing method – CSV" (from Hart et al. 2016), which is available at data.galaxyzoo.org To cross-reference the images with Table 1, this sample includes another CSV table (gz2_filename_mapping.csv) which contains three columns and 355,990 rows. The columns are:

    • objid: the Data Release 7 (DR7) object ID for each galaxy. This should match the first column in Table 1.
    • sample: string indicating the subsampling of the galaxy.
    • asset_id: an integer that corresponds to the filename of the image in the zipped file linked above.

    Acknowledgements

    They are the "original" sample of subject images in Galaxy Zoo 2 (Willett et al. 2013, MNRAS, 435, 2835, DOI: 10.1093/mnras/stt1458) as identified in Table 1 of Willett et al. and also in Hart et al. (2016, MNRAS, 461, 3663, DOI: 10.1093/mnras/stw1588).

    Inspiration

    I want to know if it's possible to cluster the images in galaxy shape types of Hubble - de Vaucouleurs Galaxy Morphology Diagram:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6067505%2F8ac7df09aa0f85a1a07ac9dc0a81b57f%2FHubble_-_de_Vaucouleurs_Galaxy_Morphology_Diagram.png?generation=1611680439647479&alt=media" alt="">

    • Ellipticals: with shapes from spherical to cilindrical almost homogeneous density.
    • Spirals: with two or more arms (like the classical view of Milky Way Galaxy) and a dense core.
    • Irregulars: with non defined shape, heterogeneous density.

    If this three are not enough and you want to improve your notebook is possible to add:

    • Lenticulars: disk shaped galaxies with a dense core.
    • Barred Spirals: Type of spiral with straight arms near to the core and bended far of it.
    • Usual Spirals: Type of spiral with bended arms from the core to the end.
    • Intermediate Spirals: Type of spiral with non-defined arms.
    • Dwarf Galaxy: Tiny irregular heterogeneous galaxy.

    Didn't add this to the first clusters due to depending on the angle of the galaxy some lenticulars may seem Ellipticals or Spirals, is hard to see always the arms of spiral galaxies and is hard to determine if a galaxy is tiny or big with just a photography and nothing to compare.

  7. Data from: PiRATE: a Pipeline to Retrieve and Annotate Transposable Elements...

    • seanoe.org
    bin
    Updated 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy Berthelier; Nathalie Casse; Nicolas Daccord; Véronique Jamilloux; Bruno Saint-Jean; Gregory Carrier (2018). PiRATE: a Pipeline to Retrieve and Annotate Transposable Elements [Dataset]. http://doi.org/10.17882/51795
    Explore at:
    binAvailable download formats
    Dataset updated
    2018
    Dataset provided by
    SEANOE
    Authors
    Jeremy Berthelier; Nathalie Casse; Nicolas Daccord; Véronique Jamilloux; Bruno Saint-Jean; Gregory Carrier
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    to date, genome assembly of non-model organisms is usually not at chromosomal level and higly fragmented. this fragmentation is recognized to be, in part, the result of a bad assembly of the transposable elements (tes) copies, increasing the difficulty to detect and annotate them.in this context, we designed a new bioinformatics pipeline named pirate for detect, classify and annotate tes of non-model organisms. pirate combines multiple analysis packages representing all the major approaches for te detection. the goal is to promote the detection of complete te sequences of every te families. the detection of complete te sequences, bearing recognizable conserved domains or specific motifs, allows to facilitate the classification step. the classification step of pirate has been optimized for algal genomes.each tools used by pirate are automated into a stand-alone galaxy. this pirate-galaxy can be used through a virtual machine, which can be download below.this pirate-galaxy is a suitable and flexible platform to study tes in the genome of every organisms.you can find a tutorial below.please contact us if you have any issues or comments : berthelier.j [at] laposte.net or gregory.carrier [at] ifremer.fror you can leave a message on github: https://github.com/jberthelier/pirate/issues

  8. Training data for 'Unicycler assembly of SARS-CoV-2 genome with...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Aug 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wolfgang Maier; Wolfgang Maier (2022). Training data for 'Unicycler assembly of SARS-CoV-2 genome with preprocessing to remove human genome reads' tutorial (Galaxy Training Material) [Dataset]. http://doi.org/10.5281/zenodo.3732359
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wolfgang Maier; Wolfgang Maier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data here is a copy of the corresponding SRR records in the NCBI SRA. The duplication serves a dual purpose:

    1. as a backup should there be problems connecting to NCBI servers, e.g., during Galaxy user trainings.
    2. to illustrate how to obtain raw sequencing data from alternative sources, and to organize the data into the same collection structure in a Galaxy history that is generated by specialized Galaxy SRA download tools.
  9. Galaxy Entertainment Group Ltd. Online Gambling Market Insights

    • statistics.technavio.org
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Galaxy Entertainment Group Ltd. Online Gambling Market Insights [Dataset]. https://statistics.technavio.org/galaxy-entertainment-group-ltd-online-gambling-market-insights
    Explore at:
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Worldwide
    Description

    Download Free Sample
    The online gambling market is expected to grow at a CAGR of 11% during the forecast period. This market growth can be attributed to various factors including rising popularity of the freemium model.

    The online gambling market report offers several other valuable insights such as:

    CAGR of the market during the forecast period 2020-2024
    Detailed information on factors that will drive online gambling market growth during the next five years
    Precise estimation of the online gambling market size and its contribution to the parent market
    Accurate predictions on upcoming trends and changes in consumer behavior
    The growth of the online gambling market industry across APAC, Europe, MEA, North America, and South America
    A thorough analysis of the market’s competitive landscape and detailed information on vendors
    Comprehensive details of factors that will challenge the growth of online gambling market vendors
    
  10. u

    Galactic interstellar dust Gaia-2MASS 3D maps

    • cdsarc.u-strasbg.fr
    Updated May 28, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDS (2019). Galactic interstellar dust Gaia-2MASS 3D maps [Dataset]. http://doi.org/10.26093/cds/vizier.36250135
    Explore at:
    Dataset updated
    May 28, 2019
    Dataset provided by
    CDS
    Description

    VizieR Online Data Catalog: Galactic interstellar dust Gaia-2MASS 3D maps(Lallement R.+, 2019)

  11. Data from: Genetic Characteristics and Phylogenetic Relationships of 18...

    • figshare.com
    zip
    Updated Mar 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenyu Sun (2025). Genetic Characteristics and Phylogenetic Relationships of 18 Anchovy Species Based on Mitochondrial Genomes in the Seas Around China [Dataset]. http://doi.org/10.6084/m9.figshare.28227167.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 29, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wenyu Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We downloaded the complete mitochondrial genome data of 18 Engraulidae fish species from the NCBI database (https://www.ncbi.nlm.nih.gov/). These files were stored in the “Download data” folder. Subsequently, we reannotated these mitochondrial genomes using the MITOS2 online tool available on the Galaxy website (https://usegalaxy.org/) and manually modified the original gb files to adjust the inaccurately annotated control regions and to add the annotation information for the light-strand replication origin. The revised files were saved in the “Reannotation” folder and were used for subsequent analyses.

  12. Data from: The Massive and Distant Clusters of WISE Survey. XII. Exploring...

    • zenodo.org
    bin, zip
    Updated Apr 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Muhibullah; Mustafa Muhibullah (2024). The Massive and Distant Clusters of WISE Survey. XII. Exploring X-ray AGN in Dynamically Active Massive Galaxy Clusters at z ∼ 1 [Dataset]. http://doi.org/10.5281/zenodo.11074555
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Apr 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mustafa Muhibullah; Mustafa Muhibullah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to the manuscript titled "The Massive and Distant Clusters of WISE Survey. XII. Exploring X-ray AGN in Dynamically Active Massive Galaxy Clusters at z ∼ 1," which has been submitted to The Astrophysical Journal. To reproduce the plots and access the catalogs used in the paper, please download and extract all the zip folders and the Jupyter Notebook "madcows_master_notebook.ipynb" provided under the same directory. Then, open the notebook and follow the instructions provided within. If you encounter any issues, please contact the corresponding author for assistance.

  13. Global smartphone unit shipments of Samsung 2010-2024, by quarter

    • statista.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global smartphone unit shipments of Samsung 2010-2024, by quarter [Dataset]. https://www.statista.com/statistics/299144/samsung-smartphone-shipments-worldwide/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the fourth quarter of 2024, Samsung shipped around ** million smartphones, a decrease from the both the previous quarter and the same quarter of the previous year. Samsung’s sales consistently place the smartphone giant among the top three smartphone vendors in the world, alongside Xiaomi and Apple. Samsung smartphone sales – how many phones does Samsung sell? Global smartphone sales reached over *** billion units during 2024. While the global smartphone market is led by Samsung and Apple, Xiaomi has gained ground following the decline of Huawei. Together, these three companies hold more than ** percent of the global smartphone market share.

  14. Sloan Digital Sky Survey - DR18

    • kaggle.com
    Updated Jul 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farid R (2023). Sloan Digital Sky Survey - DR18 [Dataset]. https://www.kaggle.com/datasets/diraf0/sloan-digital-sky-survey-dr18/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 29, 2023
    Dataset provided by
    Kaggle
    Authors
    Farid R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16012776%2Fdb7fd8faf4277c85822f8bbfe5e113d2%2Farnaud-mariat-45Z6hW1dQMI-unsplash.jpg?generation=1690636699354713&alt=media" alt="">

    This dataset consists of 100,000 observations from the Data Release (DR) 18 of the Sloan Digital Sky Survey (SDSS). Each observation is described by 42 features and 1 class column classifying the observation as either:

    • a STAR
    • a GALAXY
    • a QSO (Quasi-Stellar Object) or a Quasar.

    You can read more about the features below:

    • Objid, Specobjid - Object Identifiers
    • ra - J2000 Right Ascension
    • dec - J2000 Declination
    • redshift - Final Redshift of the celestial object
    • u, g, r, i, and z - better of DeV/Exp magnitude fit for u, g, r, i, and z. u, g, r, i, and z correspond to the five photometric bands namely ultraviolet band, green band, red band, infrared band, and near infrared band respectively.
    • run - Run number
    • rerun - Rerun number
    • camcol - Camera column
    • field - Field number

    The run number refers to a specific period in which the SDSS observes a part of the sky. SDSS is divided into several runs, each lasting for a certain amount of time, which are then combined to cover an extensive portion of the sky. The rerun number refers to the reprocessing of the data obtained.

    In each run, multiple charge-coupled device (CCD) cameras are arranged into a column which are responsible for imaging a specific portion of the sky. camcol refers to the camera column number which imaged a specific observation. A field is a specific portion of the sky that is imaged during a single exposure of the telescope. The entire sky is divided into a portion of fields and the field number column refers to the field or portion of the sky from which an observation was obtained.

    • plate - Plate number
    • fiberID - Optical Fiber ID

    A number of physical glass plates are mounted on the telescope, each containing a number of optical fibers corresponding to a specific position in the sky. When light hits these optical fibers, it is sent to spectrographs for analysis. plate number and fiberID refer to the number of the plate and the ID of the optical fiber responsible for gathering light from the celestial object respectively.

    • mjd - Modified Julian Date

    Modified Julian Date represents the number of days that have passed since midnight Nov. 17, 1858. It is used in SDSS to keep track of the time of each observation.

    • petroRad_u, petroRad_g, petroRad_r, petroRad_i, and petroRad_z - Petrosian Radii for the five photometric bands u (ultraviolet), g (green), r (red), i (infrared), and z (near-infrared) respectively.

    The petrosian radius is a measure of the size of a galaxy, and it is calculated using the petrosian flux profile. The petrosian flux profile measures how the brightness of an object varies with distance from its center. The petrosian radius is defined as the distance from the galaxy's center where the ratio of the local surface brightness to the average surface brightness reaches a certain predefined value. The local surface brightness refers to the brightness of a specific small region or pixel on the surface of an extended object. It is a measure of how much light is detected from that particular region. The average surface brightness, on the other hand, represents the mean or average brightness measured over the entire surface of the extended object. It is the total amount of light received from the object divided by its total area.

    These parameters help in characterizing the properties of celestial objects, especially when studying their morphologies, sizes, and how they evolve over time.

    • petroFlux_u, petroFlux_g, petroFlux_r, petroFlux_i, and petroFlux_z - Petrosian Fluxes for the five photometric bands u (ultraviolet), g (green), r (red), i (infrared), and z (near-infrared) respectively. These features describe the total amount of light emitted from the celestial objects.

    These parameters help in studying the photometric properties of the celestial objects, particularly in analyzing the brightness, colors, and spectral energy distribution of the objects. By using petrosian fluxes in different bands, astronomers can obtain a comprehensive view of an object's light emission across the electromagnetic spectrum.

    • petroR50_u, petroR50_g, petroR50_r, petroR50_i, and petroR50_z - Petrosian half-light radii for the five photometric bands u (ultraviolet), g (green), r (red), i (infrared), and z (near-infrared) respectively. PetroR50 is a measure of the radius at which half of the total light (or flux) emitted from a celestial object is enclosed with the petrosian aperture. The petrosian aperture is defined based on the petrosian radius, which is a measure of the size of the celestial object. The petrosian aperture allows a...
  15. Pan-cancer Aberrant Pathway Activity Analysis (PAPAA)

    • zenodo.org
    • explore.openaire.eu
    application/gzip, csv +1
    Updated Dec 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DANIEL BLANKENBERG; DANIEL BLANKENBERG; VIJAY NAGAMPALLI; VIJAY NAGAMPALLI (2020). Pan-cancer Aberrant Pathway Activity Analysis (PAPAA) [Dataset]. http://doi.org/10.5281/zenodo.3629709
    Explore at:
    application/gzip, tsv, csvAvailable download formats
    Dataset updated
    Dec 5, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    DANIEL BLANKENBERG; DANIEL BLANKENBERG; VIJAY NAGAMPALLI; VIJAY NAGAMPALLI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information about the dataset files:

    1) pancan_rnaseq_freeze.tsv.gz: Publicly available gene expression data for the TCGA Pan-cancer dataset. File: PanCanAtlas EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d9136611] [https://doi.org/10.1016/j.celrep.2018.03.046]

    2) pancan_mutation_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset. File: mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046]

    3) pancan_GISTIC_threshold.tsv.gz: Publicly available Gene- level copy number information of the TCGA Pan-cancer dataset. This file is processed using script process_copynumber.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. The files copy_number_loss_status.tsv.gz and copy_number_gain_status.tsv.gz generated from this data are used as inputs in our Galaxy pipeline. [https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443] [https://doi.org/10.1016/j.celrep.2018.03.046]

    4) mutation_burden_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/][http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046]

    5) sample_freeze.tsv or sample_freeze_version4_modify.tsv: The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNAseq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples.[https://github.com/greenelab/pancancer/]

    6) vogelstein_cancergenes.tsv: compendium of OG and TSG used for the analysis. [https://github.com/greenelab/pancancer/]

    7) CCLE_DepMap_18Q1_maf_20180207.txt.gz Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2FCCLE_DepMap_18Q1_maf_20180207.txt]

    8) ccle_rnaseq_genes_rpkm_20180929.gct.gz: Publicly available Expression data for 1019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2Fccle_2019%2FCCLE_RNAseq_genes_rpkm_20180929.gct.gz]

    9) CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct: Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This data is represented in the binary format and provided by the Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://data.broadinstitute.org/ccle_legacy_data/binary_calls_for_copy_number_and_mutation_data/CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct]

    10) GDSC_cell_lines_EXP_CCLE_names.csv.gz Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer(GDSC) cell-lines. File gdsc_cell_line_RMA_proc_basalExp.csv was downloaded. This data was subsetted to 389 cell lines that are common among CCLE and GDSC. All the GDSC cell line names were replaced with CCLE cell line names for further processing. [https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources//Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip]

    11) GDSC_CCLE_common_mut_cnv_binary.csv.gz: A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE. This file is generated using CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct and a list of common cell lines.

    12) gdsc1_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC1 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC1_fitted_dose_response_15Oct19.xlsx]

    13) gdsc2_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC2 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC2_fitted_dose_response_15Oct19.xlsx]

    14) compounds.csv: list of pharmacological compounds tested for our analysis

    15) tcga_dictonary.tsv: list of cancer types used in the analysis.

    16) seg_based_scores.tsv: Measurement of total copy number burden, Percent of genome altered by copy number alterations. This file was used as part of the Pancancer analysis by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/]

  16. X-ray investigation of the remarkable galaxy group Nest200047

    • zenodo.org
    application/gzip
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anwesh Majumder; Anwesh Majumder; Aurora Simionescu; Aurora Simionescu; Tomáš Plšek; Tomáš Plšek; Marisa Brienza; Marisa Brienza; Eugene Churazov; Eugene Churazov; Ildar Khabibullin; Ildar Khabibullin; Fabio Gastaldello; Fabio Gastaldello; ANDREA BOTTEON; ANDREA BOTTEON; Huub Rottgering; Huub Rottgering; Marcus Brüggen; Marcus Brüggen; Natalia Lyskova; Natalia Lyskova; Kamlesh Rajpurohit; Kamlesh Rajpurohit; Rashid Sunyaev; Rashid Sunyaev; Michael Wise; Michael Wise (2025). X-ray investigation of the remarkable galaxy group Nest200047 [Dataset]. http://doi.org/10.5281/zenodo.15650741
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anwesh Majumder; Anwesh Majumder; Aurora Simionescu; Aurora Simionescu; Tomáš Plšek; Tomáš Plšek; Marisa Brienza; Marisa Brienza; Eugene Churazov; Eugene Churazov; Ildar Khabibullin; Ildar Khabibullin; Fabio Gastaldello; Fabio Gastaldello; ANDREA BOTTEON; ANDREA BOTTEON; Huub Rottgering; Huub Rottgering; Marcus Brüggen; Marcus Brüggen; Natalia Lyskova; Natalia Lyskova; Kamlesh Rajpurohit; Kamlesh Rajpurohit; Rashid Sunyaev; Rashid Sunyaev; Michael Wise; Michael Wise
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data reproduction package for the paper "X-ray investigation of the remarkable galaxy group Nest200047" by Anwesh Majumder, M.W. Wise, A. Simionescu, M.N. de Vries (accepted in MNRAS).

    Raw data: The Chandra and XMM data can be downloaded from https://cda.harvard.edu/chaser/ and http://nxsa.esac.esa.int/nxsa-web/#search using the observation IDs. See the 'Data' section of the paper to know what to download. Any additional data source has been mentioned in the paper as footnotes.

    Software required:

    CIAO (https://cxc.cfa.harvard.edu/ciao/)

    XMM-SAS (https://www.cosmos.esa.int/web/xmm-newton/download-and-install-sas)

    Jupyter Notebook and Python-3.9 or higher (https://jupyter.org)

    SPEX (https://spex-xray.github.io/spex-help/index.html)

    PyProffit (https://pyproffit.readthedocs.io/en/latest/index.html)

    CXBups (https://zenodo.org/records/2575495)

    There are README files inside directories.

  17. Datasets of the DIMet manuscript

    • zenodo.org
    zip
    Updated Apr 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johanna Galvis; Joris Guyon; Benjamin Dartigues; Helge Hecht; Florian Specque; Hayssam Soueidan; Slim Karkar; Thomas Daubon; Macha Nikolski; Johanna Galvis; Joris Guyon; Benjamin Dartigues; Helge Hecht; Florian Specque; Hayssam Soueidan; Slim Karkar; Thomas Daubon; Macha Nikolski (2024). Datasets of the DIMet manuscript [Dataset]. http://doi.org/10.5281/zenodo.10579862
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johanna Galvis; Joris Guyon; Benjamin Dartigues; Helge Hecht; Florian Specque; Hayssam Soueidan; Slim Karkar; Thomas Daubon; Macha Nikolski; Johanna Galvis; Joris Guyon; Benjamin Dartigues; Helge Hecht; Florian Specque; Hayssam Soueidan; Slim Karkar; Thomas Daubon; Macha Nikolski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets for reproducing the results of the manuscript "DIMet : An open-source tool for Differential analysis of targeted Isotope-labeled Metabolomics data". DIMet tool is available here, and the tool documentation is accessible in the DIMet wiki page and in its Galaxy site.

    Users of the Galaxy version of DIMet:

    • download and decompress (unzip) the .zip file.
    • within the 'datasets_manuscript_DIMet/' there is a sub-folder data/, preserve.
    • within 'datasets_manuscript_DIMet/' there is a sub-folder config/, the user can delete it as it is not used in the Galaxy version.
    • use the .csv files that are provided in data/ . The specific .csv files to be given as input are explained in each 'dimet_' module in Galaxy.
    • check metadata_endo_ldh.csv and metadata_timeseries.csv files: if all the content has quotes (") for delimiting the strings, please edit the file in a plain text editor (e.g. pad, gedit, etc) and delete such quotes (replace all " by no character). These quotes (") in the samples metadata, which are tolerated in the command line version, are not allowed in the galaxy version.

    Users of the command-line version of DIMet:

    • download, decompress it and follow the instructions of the documentation in the DIMet wiki page.

  18. Anti-Spoofing Dataset, 95,000 sets

    • kaggle.com
    Updated Jul 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Axon Labs (2025). Anti-Spoofing Dataset, 95,000 sets [Dataset]. https://www.kaggle.com/datasets/axondata/face-anti-spoofing-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Axon Labs
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Anti-Spoofing dataset: live, replay, cut, print, 3D masks - large-scale face anti spoofing

    This dataset delivers a single, end-to-end resource for training and benchmarking facial liveness-detection systems. By aggregating live sessions and eleven realistic presentation-attack classes into one collection, it accelerates development toward iBeta Level 1/2 compliance and strengthens model robustness against the full spectrum of spoofing tactics

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20109613%2F6432e95d7b7fef1d271457f172e11e0c%2FFrame%20103-3.png?generation=1753867895186569&alt=media" alt="">

    Why Comprehensive Anti-Spoofing Data?

    Modern certification pipelines demand proof that a system resists all common attack vectors—not just prints or replays. This dataset delivers those vectors in one place, allowing you to: - Benchmark a model’s true generalisation - Fine-tune against rare but high-impact threats (e.g., silicone or textile masks) - Streamline audits by demonstrating coverage of every ISO 30107-3 attack category

    Dataset Features

    • Dataset Size: ≈ 95 000 videos / image sequences spanning live captures and eleven spoof classes
    • Attack Diversity: 3D paper mask, wrapped 3D mask, photo print, mobile replay, display replay, cut-out 2D mask, silicone mask, latex mask, textile mask
    • Active Liveness Cues: Natural blinks, and head rotations included across live and mask sessions
    • Attribute Range: different combinations of hairstyles, eyewear, facial hair, and accessories.
    • Environmental Variability: Indoor/outdoor scenes under various lighting conditions
    • Multi-angle Capture: Mainly used selfie camera, also back
    • Capture Devices: Footage from flagship and mid-range phones (iPhone 14 / 13 Pro, Galaxy S23, Pixel 7, Redmi Note 12 Pro+, Galaxy A54, Honor 70)
    • Additional Flexibility: Custom re-captures available on request

    Full version of dataset is availible for commercial usage - leave a request on our website Axonlabs to purchase the dataset 💰

    Technical Specifications

    • File Format: MP4 for video, JPEG/PNG for still sequences; all compatible with mainstream ML frameworks
    • Resolution & FPS: Up to 4K @ 60 fps; balanced presets included for rapid training

    Best Uses

    Ideal for companies pursuing or maintaining iBeta Level 1/2 certification, research groups exploring new PAD architectures, and vendors stress-testing production face-verification pipelines

    Attack Classes

    • Live / Genuine Natural faces with spontaneous movements across varied devices and lighting
    • 3D Paper Mask Folded paper masks with protruding nose/forehead
    • Wrapped 3D Print Rigid paper moulds reproducing head geometry
    • Photo Print Glossy still photos at multiple angles—the classic 2D spoof
    • Cylinder 3D Paper Mask A folded or cylindrical sheet of paper that simulates volume
    • Mobile Replay Face videos played on phone screens; includes glare and auto-brightness shifts
    • Display Replay Attacks via monitors, and laptops
    • Cut-out 2D Mask Flat printed masks with eye/mouth holes plus active head motion
    • On-actor Print / Cuts Paper elements (photos, cutouts) are glued directly onto the actor's face
    • Silicone and Latex Masks High-detail silicone/latex overlays with blinking and subtle mimicry
    • Cloth 3D Mask Elastic fabric masks hugging facial contours during movement
    • High-Fidelity Resin Mask Hyperrealistic masks with detailed skin texture

    Conclusion

    This dataset’s scale, breadth of attack types, and real-world capture conditions make it indispensable for anyone building or evaluating biometric anti-spoofing solutions. Deploy it to harden your systems against today’s—and tomorrow’s—most sophisticated presentation attacks

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Li Xin (2023). Galaxy, star, quasar dataset [Dataset]. http://doi.org/10.57760/sciencedb.07177

Galaxy, star, quasar dataset

Explore at:
288 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2023
Dataset provided by
Science Data Bank
Authors
Li Xin
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The data used in this paper is from the 16th issue of SDSS. SDSS-DR16 contains a total of 930,268 photometric images, with 1.2 billion observation sources and tens of millions of spectra. The data obtained in this paper is downloaded from the official website of SDSS. Specifically, the data is obtained through the SkyServerAPI structure by using SQL query statements in the subwebsite CasJobs. As the current SDSS photometric table PhotoObj can only classify all observed sources as point sources and surface sources, the target sources can be better classified as galaxies, stars and quasars through spectra. Therefore, we obtain calibrated sources in CasJobs by crossing SpecPhoto with the PhotoObj star list, and obtain target position information (right ascension and declination). Calibrated sources can tell them apart precisely and quickly. Each calibrated source is labeled with the parameter "Class" as "galaxy", "star", or "quasar". In this paper, observation day area 3462, 3478, 3530 and other 4 areas in SDSS-DR16 are selected as experimental data, because a large number of sources can be obtained in these areas to provide rich sample data for the experiment. For example, there are 9891 sources in the 3462-day area, including 2790 galactic sources, 2378 stellar sources and 4723 quasar sources. There are 3862 sources in the 3478 day area, including 1759 galactic sources, 577 stellar sources and 1526 quasar sources. FITS files are a commonly used data format in the astronomical community. By cross-matching the star list and FITS files in the local celestial region, we obtained images of 5 bands of u, g, r, i and z of 12499 galaxy sources, 16914 quasar sources and 16908 star sources as training and testing data.1.1 Image SynthesisSDSS photometric data includes photometric images of five bands u, g, r, i and z, and these photometric image data are respectively packaged in single-band format in FITS files. Images of different bands contain different information. Since the three bands g, r and i contain more feature information and less noise, Astronomical researchers typically use the g, r, and i bands corresponding to the R, G, and B channels of the image to synthesize photometric images. Generally, different bands cannot be directly synthesized. If three bands are directly synthesized, the image of different bands may not be aligned. Therefore, this paper adopts the RGB multi-band image synthesis software written by He Zhendong et al. to synthesize images in g, r and i bands. This method effectively avoids the problem that images in different bands cannot be aligned. The pixel of each photometry image in this paper is 2048×1489.1.2 Data tailoringThis paper first clipped the target image, image clipping can use image segmentation tools to solve this problem, this paper uses Python to achieve this process. In the process of clipping, we convert the right ascension and declination of the source in the star list into pixel coordinates on the photometric image through the coordinate conversion formula, and determine the specific position of the source through the pixel coordinates. The coordinates are regarded as the center point and clipping is carried out in the form of a rectangular box. We found that the input image size affects the experimental results. Therefore, according to the target size of the source, we selected three different cutting sizes, 40×40, 60×60 and 80×80 respectively. Through experiment and analysis, we find that convolutional neural network has better learning ability and higher accuracy for data with small image size. In the end, we chose to divide the surface source galaxies, point source quasars, and stars into 40×40 sizes.1.3 Division of training and test dataIn order to make the algorithm have more accurate recognition performance, we need enough image samples. The selection of training set, verification set and test set is an important factor affecting the final recognition accuracy. In this paper, the training set, verification set and test set are set according to the ratio of 8:1:1. The purpose of verification set is used to revise the algorithm, and the purpose of test set is used to evaluate the generalization ability of the final algorithm. Table 1 shows the specific data partitioning information. The total sample size is 34,000 source images, including 11543 galaxy sources, 11967 star sources, and 10490 quasar sources.1.4 Data preprocessingIn this experiment, the training set and test set can be used as the training and test input of the algorithm after data preprocessing. The data quantity and quality largely determine the recognition performance of the algorithm. The pre-processing of the training set and the test set are different. In the training set, we first perform vertical flip, horizontal flip and scale on the cropped image to enrich the data samples and enhance the generalization ability of the algorithm. Since the features in the celestial object source have the flip invariability, the labels of galaxies, stars and quasars will not change after rotation. In the test set, our preprocessing process is relatively simple compared with the training set. We carry out simple scaling processing on the input image and test input the obtained image.

Search
Clear search
Close search
Google apps
Main menu