5 datasets found
  1. R

    Solar flare forecasting based on magnetogram sequences learning with MViT...

    • redu.unicamp.br
    • data.niaid.nih.gov
    • +1more
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repositório de Dados de Pesquisa da Unicamp (2024). Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation [Dataset]. http://doi.org/10.25824/redu/IH0AH0
    Explore at:
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Repositório de Dados de Pesquisa da Unicamp
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
    Description

    Source codes and dataset of the research "Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation". Our work employed PyTorch, a framework for training Deep Learning models with GPU support and automatic back-propagation, to load the MViTv2 s models with Kinetics-400 weights. To simplify the code implementation, eliminating the need for an explicit loop to train and the automation of some hyperparameters, we use the PyTorch Lightning module. The inputs were batches of 10 samples with 16 sequenced images in 3-channel resized to 224 × 224 pixels and normalized from 0 to 1. Most of the papers in our literature survey split the original dataset chronologically. Some authors also apply k-fold cross-validation to emphasize the evaluation of the model stability. However, we adopt a hybrid split taking the first 50,000 to apply the 5-fold cross-validation between the training and validation sets (known data), with 40,000 samples for training and 10,000 for validation. Thus, we can evaluate performance and stability by analyzing the mean and standard deviation of all trained models in the test set, composed of the last 9,834 samples, preserving the chronological order (simulating unknown data). We develop three distinct models to evaluate the impact of oversampling magnetogram sequences through the dataset. The first model, Solar Flare MViT (SF MViT), has trained only with the original data from our base dataset without using oversampling. In the second model, Solar Flare MViT over Train (SF MViT oT), we only apply oversampling on training data, maintaining the original validation dataset. In the third model, Solar Flare MViT over Train and Validation (SF MViT oTV), we apply oversampling in both training and validation sets. We also trained a model oversampling the entire dataset. We called it the "SF_MViT_oTV Test" to verify how resampling or adopting a test set with unreal data may bias the results positively. GitHub version The .zip hosted here contains all files from the project, including the checkpoint and the output files generated by the codes. We have a clean version hosted on GitHub (https://github.com/lfgrim/SFF_MagSeq_MViTs), without the magnetogram_jpg folder (which can be downloaded directly on https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip) and the output and checkpoint files. Most code files hosted here also contain comments on the Portuguese language, which are being updated to English in the GitHub version. Folders Structure In the Root directory of the project, we have two folders: magnetogram_jpg: holds the source images provided by Space Environment Artificial Intelligence Early Warning Innovation Workshop through the link https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip. It comprises 73,810 samples of high-quality magnetograms captured by HMI/SDO from 2010 May 4 to 2019 January 26. The HMI instrument provides these data (stored in hmi.sharp_720s dataset), making new samples available every 12 minutes. However, the images from this dataset were collected every 96 minutes. Each image has an associated magnetogram comprising a ready-made snippet of one or most solar ARs. It is essential to notice that the magnetograms cropped by SHARP can contain one or more solar ARs classified by the National Oceanic and Atmospheric Administration (NOAA). Seq_Magnetogram: contains the references for source images with the corresponding labels in the next 24 h. and 48 h. in the respectively M24 and M48 sub-folders. M24/M48: both present the following sub-folders structure: Seqs16; SF_MViT; SF_MViT_oT; SF_MViT_oTV; SF_MViT_oTV_Test. There are also two files in root: inst_packages.sh: install the packages and dependencies to run the models. download_MViTS.py: download the pre-trained MViTv2_S from PyTorch and store it in the cache. M24 and M48 folders hold reference text files (flare_Mclass...) linking the images in the magnetogram_jpg folders or the sequences (Seq16_flare_Mclass...) in the Seqs16 folders with their respective labels. They also hold "cria_seqs.py" which was responsible for creating the sequences and "test_pandas.py" to verify head info and check the number of samples categorized by the label of the text files. All the text files with the prefix "Seq16" and inside the Seqs16 folder were created by "criaseqs.py" code based on the correspondent "flare_Mclass" prefixed text files. Seqs16 folder holds reference text files, in which each file contains a sequence of images that was pointed to the magnetogram_jpg folders. All SF_MViT... folders hold the model training codes itself (SF_MViT...py) and the corresponding job submission (jobMViT...), temporary input (Seq16_flare...), output (saida_MVIT... and MViT_S...), error (err_MViT...) and checkpoint files (sample-FLARE...ckpt). Executed model training codes generate output, error, and checkpoint files. There is also a folder called "lightning_logs" that stores logs of trained models. Naming pattern for the files: magnetogram_jpg: follows the format "hmi.sharp_720s...magnetogram.fits.jpg" and Seqs16: follows the format "hmi.sharp_720s...to.", where: hmi: is the instrument that captured the image sharp_720s: is the database source of SDO/HMI. : is the identification of SHARP region, and can contain one or more solar ARs classified by the (NOAA). : is the date-time the instrument captured the image in the format yyyymmdd_hhnnss_TAI (y:year, m:month, d:day, h:hours, n:minutes, s:seconds). : is the date-time when the sequence starts, and follow the same format of . : is the date-time when the sequence ends, and follow the same format of . Reference text files in M24 and M48 or inside SF_MViT... folders follows the format "flare_Mclass_.txt", where: : is Seq16 if refers to a sequence, or void if refers direct to images. : "24h" or "48h". : is "TrainVal" or "Test". The refers to the split of Train/Val. : void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. All SF_MViT...folders: Model training codes: "SF_MViT_M+_", where: : void or "oT" (over Train) or "oTV" (over Train and Val) or "oTV_Test" (over Train, Val and Test); : "24h" or "48h"; : "oneSplit" for a specific split or "allSplits" if run all splits. : void is default to run 1 GPU or "2gpu" to run into 2 gpus systems; Job submission files: "jobMViT_", where: : point the queue in Lovelace environment hosted on CENAPAD-SP (https://www.cenapad.unicamp.br/parque/jobsLovelace) Temporary inputs: "Seq16_flare_Mclass_.txt: : train or val; : void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. Outputs: "saida_MViT_Adam_10-7", where: : k0 to k4, means the correlated split of the output, or void if the output is from all splits. Error files: "err_MViT_Adam_10-7", where: : k0 to k4, means the correlated split of the error log file, or void if the error file is from all splits. Checkpoint files: "sample-FLARE_MViT_S_10-7-epoch=-valid_loss=-Wloss_k=.ckpt", where: : epoch number of the checkpoint; : corresponding valid loss; : 0 to 4.

  2. Aluminum alloy industrial materials defect

    • figshare.com
    zip
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ying Han; Yugang Wang (2024). Aluminum alloy industrial materials defect [Dataset]. http://doi.org/10.6084/m9.figshare.27922929.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    figshare
    Authors
    Ying Han; Yugang Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset used in this study experiment was from the preliminary competition dataset of the 2018 Guangdong Industrial Intelligent Manufacturing Big Data Intelligent Algorithm Competition organized by Tianchi Feiyue Cloud (https://tianchi.aliyun.com/competition/entrance/231682/introduction). We have selected the dataset, removing images that do not meet the requirements of our experiment. All datasets have been classified for training and testing. The image pixels are all 2560×1960. Before training, all defects need to be labeled using labelimg and saved as json files. Then, all json files are converted to txt files. Finally, the organized defect dataset is detected and classified.Description of the data and file structureThis is a project based on the YOLOv8 enhanced algorithm for aluminum defect classification and detection tasks.All code has been tested on Windows computers with Anaconda and CUDA-enabled GPUs. The following instructions allow users to run the code in this repository based on a Windows+CUDA GPU system already in use.Files and variablesFile: defeat_dataset.zipDescription:SetupPlease follow the steps below to set up the project:Download Project RepositoryDownload the project repository defeat_dataset.zip from the following location.Unzip and navigate to the project folder; it should contain a subfolder: quexian_datasetDownload data1.Download data .defeat_dataset.zip2.Unzip the downloaded data and move the 'defeat_dataset' folder into the project's main folder.3. Make sure that your defeat_dataset folder now contains a subfolder: quexian_dataset.4. Within the folder you should find various subfolders such as addquexian-13, quexian_dataset, new_dataset-13, etc.softwareSet up the Python environment1.Download and install the Anaconda.2.Once Anaconda is installed, activate the Anaconda Prompt. For Windows, click Start, search for Anaconda Prompt, and open it.3.Create a new conda environment with Python 3.8. You can name it whatever you like; for example. Enter the following command: conda create -n yolov8 python=3.84.Activate the created environment. If the name is , enter: conda activate yolov8Download and install the Visual Studio Code.Install PyTorch based on your system:For Windows/Linux users with a CUDA GPU: bash conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forgeInstall some necessary libraries:Install scikit-learn with the command: conda install anaconda scikit-learn=0.24.1Install astropy with: conda install astropy=4.2.1Install pandas using: conda install anaconda pandas=1.2.4Install Matplotlib with: conda install conda-forge matplotlib=3.5.3Install scipy by entering: conda install scipy=1.10.1RepeatabilityFor PyTorch, it's a well-known fact:There is no guarantee of fully reproducible results between PyTorch versions, individual commits, or different platforms. In addition, results may not be reproducible between CPU and GPU executions, even if the same seed is used.All results in the Analysis Notebook that involve only model evaluation are fully reproducible. However, when it comes to updating the model on the GPU, the results of model training on different machines vary.Access informationOther publicly accessible locations of the data:https://tianchi.aliyun.com/dataset/public/Data was derived from the following sources:https://tianchi.aliyun.com/dataset/140666Data availability statementThe ten datasets used in this study come from Guangdong Industrial Wisdom Big Data Innovation Competition - Intelligent Algorithm Competition Rematch. and the dataset download link is https://tianchi.aliyun.com/competition/entrance/231682/information?lang=en-us. Officially, there are 4,356 images, including single blemish images, multiple blemish images and no blemish images. The official website provides 4,356 images, including single defect images, multiple defect images and no defect images. We have selected only single defect images and multiple defect images, which are 3,233 images in total. The ten defects are non-conductive, effacement, miss bottom corner, orange, peel, varicolored, jet, lacquer bubble, jump into a pit, divulge the bottom and blotch. Each image contains one or more defects, and the resolution of the defect images are all 2560×1920.By investigating the literature, we found that most of the experiments were done with 10 types of defects, so we chose three more types of defects that are more different from these ten types and more in number, which are suitable for the experiments. The three newly added datasets come from the preliminary dataset of Guangdong Industrial Wisdom Big Data Intelligent Algorithm Competition. The dataset can be downloaded from https://tianchi.aliyun.com/dataset/140666. There are 3,000 images in total, among which 109, 73 and 43 images are for the defects of bruise, camouflage and coating cracking respectively. Finally, the 10 types of defects in the rematch and the 3 types of defects selected in the preliminary round are fused into a new dataset, which is examined in this dataset.In the processing of the dataset, we tried different division ratios, such as 8:2, 7:3, 7:2:1, etc. After testing, we found that the experimental results did not differ much for different division ratios. Therefore, we divide the dataset according to the ratio of 7:2:1, the training set accounts for 70%, the validation set accounts for 20%, and the testing set accounts for 10%. At the same time, the random number seed is set to 0 to ensure that the results obtained are consistent every time the model is trained.Finally, the mean Average Precision (mAP) metric obtained from the experiment was tested on the dataset a total of three times. Each time the results differed very little, but for the accuracy of the experimental results, we took the average value derived from the highest and lowest results. The highest was 71.5% and the lowest was 71.1%, resulting in an average detection accuracy of 71.3% for the final experiment.All data and images utilized in this research are from publicly available sources, and the original creators have given their consent for these materials to be published in open-access formats.The settings for other parameters are as follows. epochs: 200,patience: 50,batch: 16,imgsz: 640,pretrained: true,optimizer: SGD,close_mosaic: 10,iou: 0.7,momentum: 0.937,weight_decay: 0.0005,box: 7.5,cls: 0.5,dfl: 1.5,pose: 12.0,kobj: 1.0,save_dir: runs/trainThe defeat_dataset.(ZIP)is mentioned in the Supporting information section of our manuscript. The underlying data are held at Figshare. DOI: 10.6084/m9.figshare.27922929.The results_images.zipin the system contains the experimental results graphs.The images_1.zipand images_2.zipin the system contain all the images needed to generate the manuscript.tex manuscript.

  3. o

    Data from: Linear-Transformer

    • explore.openaire.eu
    Updated May 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juncai Guo (2021). Linear-Transformer [Dataset]. http://doi.org/10.5281/zenodo.4761073
    Explore at:
    Dataset updated
    May 14, 2021
    Authors
    Juncai Guo
    Description
    1. Runtime Environment: 4 NVIDIA 2080 Ti GPUs Ubuntu 16.04 CUDA 10.0 (with CuDNN of the corresponding version) Python 3.7 PyTorch 1.2.0 2. Before the test, you should parse the GLOVE embeddings: 1) Download the embedding data from http://downloads.cs.stanford.edu/nlp/data/glove.840B.300d.zip to the directory “ ./exp_linear_transformer/data/glove/ ” 2) unzip the zip file with the name “ glove.840B.300d.txt ” 3) run the python file “ parse_glove.py ” in the directory “ ./exp_linear_transformer/src_code/parse_glove/ “ 3. Then, install the fastnlp 0.4.1 with pip. If other packages are missed, install them with pip. 4. Details about model tests are as follows: 1) Ex1. Sentiment Classification on SST dataset a. Download the dataset files from https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip and https://nlp.stanford.edu/~socherr/stanfordSentimentTreebank.zip, Unzip both the zip files and copy all the files to the directory “ ./exp_linear_transformer/data/SST/stanfordSentimentTreebank/ ”. Then, go to the directory “./exp_linear_transformer/src_code/test_SST ”, and b. Run the file “ make_raw_data.py ”; c. Run the file “ preprocessor.py ”; d. Run the file “ linear_transformer_model.py ” to test the model. The result will be printed in the end. We have saved the test log into the directory " ./exp_linear_transformer/data/SST/log/ " 2) Ex2. Semantic Matching on STS dataset a. Download the dataset file from http://ixa2.si.ehu.es/stswiki/images/4/48/Stsbenchmark.tar.gz , Unzip the file, Delete the top 4 columns in the file sts-train.csv, sts-dev.csv, and sts-dev.csv, Rename the 3 files as sts-train.txt, sts-dev.txt, and sts-dev.txt and copy the files to the directory " ./exp_linear_transformer/data/SEMEVAL2017T1/stsbenchmark/ ". Then, go to the directory “./exp_linear_transformer/src_code/test_SEMEVAL2017T1/ ”, and b. Run the file “ make_raw_data.py ”; c. Run the file “ preprocessor.py ”; d. Run the file “ linear_transformer_model.py ”, "star_transformer_model.py", and "transformer_model.py" to test the models. The results will be printed in the end. We have saved the test logs into the directory " ./exp_linear_transformer/data/SEMEVAL2017T1/log/ " 3) Ex3. Language Inference on SNLI dataset a. Download the dataset file from https://nlp.stanford.edu/projects/snli/snli_1.0.zip unzip the file and copy the unzipped files to the directory " ./exp_linear_transformer/data/SNLI/snli_1.0/ ". Then, go to the directory “./exp_linear_transformer/src_code/test_SNLI ", and b. Run the file “ make_raw_data.py ”; c. Run the file “ preprocessor.py ”; d. Run the file “ linear_transformer_model.py ” to test the model. The result will be printed in the end. We have saved the test log into the directory " ./exp_linear_transformer/data/SNLI/log/ " 4) Ex4. Computational Analysis on STS dataset a. Go to the directory “./exp_linear_transformer/src_code/test_SEMEVAL2017T1/ ”, and run the file " time_plot.py " to get the graph of " Training time v.s. sequence length "; b. Reset " batch_size=16 " in the file " config.py ", run the file “ linear_transformer_model.py ”, "star_transformer_model.py", and "transformer_model.py" separately on a single GPU. According to the diffrent " Max Seq Len "s printed, record the GPU memory to the variable " length_memory " in the file " time_plot.py ". Then run the file to get the graph of " GPU memory v.s. sequence length "
  4. z

    Data of "Self-consistency Reinforced minimal Gated Recurrent Unit for...

    • zenodo.org
    bin, zip
    Updated Mar 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ling Wu; Ling Wu; Ludovic Noels; Ludovic Noels (2024). Data of "Self-consistency Reinforced minimal Gated Recurrent Unit for surrogate modeling of history-dependent non-linear problems: application to history-dependent homogenized response of heterogeneous materials" [Dataset]. http://doi.org/10.5281/zenodo.10551272
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Mar 25, 2024
    Dataset provided by
    Zenodo
    Authors
    Ling Wu; Ling Wu; Ludovic Noels; Ludovic Noels
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Development of the Self-Consistency reinforced Minimum Recurrent Unit (SC-MRU)

    This directory contains the data and algorithms generated in publication1

    Table of Contents

    1. Dependencies and Prerequisites
    2. Structure of Repository
    3. Part 1: Data preparation
    4. Part 2: RNN training
    5. Part 3: Multiscale analysis
    6. Part 4: Reproduce paper[^1] figures

    Dependencies and Prerequisites

    • Python, pandas, matplotlib, texttabble and latextable are pre requisites for visualizing and navigating the data.

    • For generating mesh and for vizualization, gmsh (www.gmsh.info) is required.

    • For running simulations, cm3Libraries (http://www.ltas-cm3.ulg.ac.be/openSource.htm) is required.

    Instructions using apt & pip3 package manager

    Instructions for Debian/Ubuntu based workstations are as follows.

    python, pandas and dependencies

     sudo apt install python3 python3-scipy libpython3-dev python3-numpy python3-pandas

    matplotlib, texttabble and latextable

     pip3 install matplotlib texttable latextable

    Pytorch (only for run with cm3Libraries)

    • Without GPU
     pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
    • With GPU
     pip3 install torch torchvision torchaudio

    Libtorch (for compiling the cells)

    • Without GPU: In a local directory (e.g. ~/local with export TORCHDIR=$HOME/local/libtorch)
    • With GPU: In a local directory (e.g. ~/local with export TORCHDIR=$HOME/local/libtorch)

    Structure of Repository

    Part 1: Data preparation

    Generate the loading paths

    • TrainingPaths/testGenerationData.py is used to generate random walk paths, with the options
      • Rmax = 0.11 # bound on the final Green Lagrange strain
      • TimeStep = 1. # in second
      • EvalStep = [1e-4,5e-3] #Bounds on the Green Lagrange increments
      • Nmax = 2500 #maximum length of the sequence
      • k = 4000 # number of path to generate
      • The path are storred by default in ConstRVE/Paths/. The path has to be existing before launching the script. You can change the name in line 123 saveDir = '../ConstRVE'+'/Paths/'.
      • Examples of generated paths can be found in ConstRVE/PathsExamples/
      • The command to be run from the directory TrainingPaths is
    (mkdir ../ConstRVE/Paths) #if needed
    python3 testGenerationData.py
    • TrainingPaths/generationData_Cyclic.py is used to generate random cylic paths, with the options
      • Rmax = [np.random.uniform(0.,0.04),np.random.uniform(0.,0.06),np.random.uniform(0.0,0.09),0.12] # bound on the final Green Lagrange strain is random
      • TimeStep = 1. # in second
      • EvalStep = [1e-4,5e-3] #Bounds on the Green Lagrange increments
      • Nmax = 2500 #maximum length of the sequence
      • k = 2000 # number of path to generate
      • The path are stored by default in ConstRVE/Paths/. You can change the name in line 123 saveDir = '../ConstRVE'+'/Paths/'.
      • The command to be run from the directory TrainingPaths is
    (mkdir ../ConstRVE/Paths) #if needed
     python3 generationData_Cyclic.py
    • TrainingPaths/countPathLength.py gives average, minimum and maximum lengths of the generated paths and the distribution of the \Delta R. By default the paths are read in ConstRVE/Paths/ but the directory can be given as an argument. The file can be used to read
      • either the generated loading paths
     python3 countPathLength.py '../ConstRVE/PathsExamples'
     python3 countPathLength.py '../All_Path_Res/Path_Res9'

    Generate the RVEs direct simulation results

  5. Data of "Stochastic Deep Material Networks as Efficient Surrogates for...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ling Wu; Ling Wu; Ludovic Noels; Ludovic Noels (2025). Data of "Stochastic Deep Material Networks as Efficient Surrogates for Stochastic Homogenisation of Non-linear Heterogeneous Materials" [Dataset]. http://doi.org/10.5281/zenodo.14861537
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ling Wu; Ling Wu; Ludovic Noels; Ludovic Noels
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Stochastic Deep Material Networks as Efficient Surrogates for Stochastic Homogenisation of Non-linear Heterogeneous Materials

    This directory contains the data and algorithms generated in publication1

    Table of Contents

    1. Dependencies and Prerequisites
    2. Structure of Repository
    3. Images/Geometries and IB-DMN training data of the 6 SVEs
    4. Stochastic analysis - Direct numerical simulations of SVEs
    5. Training of the reference IB-DMN
    6. Stochastic analysis - Stochastic IB-DMN
    7. Reproduce paper[^1] figures

    Dependencies and Prerequisites

    • Python, pandas, matplotlib, texttabble and latextable are pre requisites for visualizing and navigating the data.

    • For generating mesh and for vizualization, gmsh (www.gmsh.info) is required.

    • For running simulations, cm3Libraries (http://www.ltas-cm3.ulg.ac.be/openSource.htm) is required.

    Instructions using apt & pip3 package manager

    Instructions for Debian/Ubuntu based workstations are as follows.

    python, pandas and dependencies

     sudo apt install python3 python3-scipy libpython3-dev python3-numpy python3-pandas

    matplotlib, texttabble and latextable

     pip3 install matplotlib texttable latextable

    Pytorch

    • Without GPU
     pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
    • With GPU
     pip3 install torch torchvision torchaudio

    Libtorch (only when using cm3Libraries)

    • Without GPU: In a local directory (e.g. ~/local with export TORCHDIR=$HOME/local/libtorch)
     wget wget https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-2.3.0%2Bcpu.zip 
    unzip libtorch-shared-with-deps-2.1.1+cpu.zip
    • With GPU: In a local directory (e.g. ~/local with export TORCHDIR=$HOME/local/libtorch)

    Structure of Repository

    Images/Geometries and IB-DMN training data of the 6 SVEs: 6SVE_Example

    1. 6SVE_Example/6SVE_Data: Images/Geometries and IB-DMN training data of the 6 SVEs
    2. 6SVE_Example/6SVE_DNS:
    1. 6SVE_Example/6SVE_DMN:
  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Repositório de Dados de Pesquisa da Unicamp (2024). Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation [Dataset]. http://doi.org/10.25824/redu/IH0AH0

Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation

Explore at:
Dataset updated
Jul 15, 2024
Dataset provided by
Repositório de Dados de Pesquisa da Unicamp
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Dataset funded by
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Description

Source codes and dataset of the research "Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation". Our work employed PyTorch, a framework for training Deep Learning models with GPU support and automatic back-propagation, to load the MViTv2 s models with Kinetics-400 weights. To simplify the code implementation, eliminating the need for an explicit loop to train and the automation of some hyperparameters, we use the PyTorch Lightning module. The inputs were batches of 10 samples with 16 sequenced images in 3-channel resized to 224 × 224 pixels and normalized from 0 to 1. Most of the papers in our literature survey split the original dataset chronologically. Some authors also apply k-fold cross-validation to emphasize the evaluation of the model stability. However, we adopt a hybrid split taking the first 50,000 to apply the 5-fold cross-validation between the training and validation sets (known data), with 40,000 samples for training and 10,000 for validation. Thus, we can evaluate performance and stability by analyzing the mean and standard deviation of all trained models in the test set, composed of the last 9,834 samples, preserving the chronological order (simulating unknown data). We develop three distinct models to evaluate the impact of oversampling magnetogram sequences through the dataset. The first model, Solar Flare MViT (SF MViT), has trained only with the original data from our base dataset without using oversampling. In the second model, Solar Flare MViT over Train (SF MViT oT), we only apply oversampling on training data, maintaining the original validation dataset. In the third model, Solar Flare MViT over Train and Validation (SF MViT oTV), we apply oversampling in both training and validation sets. We also trained a model oversampling the entire dataset. We called it the "SF_MViT_oTV Test" to verify how resampling or adopting a test set with unreal data may bias the results positively. GitHub version The .zip hosted here contains all files from the project, including the checkpoint and the output files generated by the codes. We have a clean version hosted on GitHub (https://github.com/lfgrim/SFF_MagSeq_MViTs), without the magnetogram_jpg folder (which can be downloaded directly on https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip) and the output and checkpoint files. Most code files hosted here also contain comments on the Portuguese language, which are being updated to English in the GitHub version. Folders Structure In the Root directory of the project, we have two folders: magnetogram_jpg: holds the source images provided by Space Environment Artificial Intelligence Early Warning Innovation Workshop through the link https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip. It comprises 73,810 samples of high-quality magnetograms captured by HMI/SDO from 2010 May 4 to 2019 January 26. The HMI instrument provides these data (stored in hmi.sharp_720s dataset), making new samples available every 12 minutes. However, the images from this dataset were collected every 96 minutes. Each image has an associated magnetogram comprising a ready-made snippet of one or most solar ARs. It is essential to notice that the magnetograms cropped by SHARP can contain one or more solar ARs classified by the National Oceanic and Atmospheric Administration (NOAA). Seq_Magnetogram: contains the references for source images with the corresponding labels in the next 24 h. and 48 h. in the respectively M24 and M48 sub-folders. M24/M48: both present the following sub-folders structure: Seqs16; SF_MViT; SF_MViT_oT; SF_MViT_oTV; SF_MViT_oTV_Test. There are also two files in root: inst_packages.sh: install the packages and dependencies to run the models. download_MViTS.py: download the pre-trained MViTv2_S from PyTorch and store it in the cache. M24 and M48 folders hold reference text files (flare_Mclass...) linking the images in the magnetogram_jpg folders or the sequences (Seq16_flare_Mclass...) in the Seqs16 folders with their respective labels. They also hold "cria_seqs.py" which was responsible for creating the sequences and "test_pandas.py" to verify head info and check the number of samples categorized by the label of the text files. All the text files with the prefix "Seq16" and inside the Seqs16 folder were created by "criaseqs.py" code based on the correspondent "flare_Mclass" prefixed text files. Seqs16 folder holds reference text files, in which each file contains a sequence of images that was pointed to the magnetogram_jpg folders. All SF_MViT... folders hold the model training codes itself (SF_MViT...py) and the corresponding job submission (jobMViT...), temporary input (Seq16_flare...), output (saida_MVIT... and MViT_S...), error (err_MViT...) and checkpoint files (sample-FLARE...ckpt). Executed model training codes generate output, error, and checkpoint files. There is also a folder called "lightning_logs" that stores logs of trained models. Naming pattern for the files: magnetogram_jpg: follows the format "hmi.sharp_720s...magnetogram.fits.jpg" and Seqs16: follows the format "hmi.sharp_720s...to.", where: hmi: is the instrument that captured the image sharp_720s: is the database source of SDO/HMI. : is the identification of SHARP region, and can contain one or more solar ARs classified by the (NOAA). : is the date-time the instrument captured the image in the format yyyymmdd_hhnnss_TAI (y:year, m:month, d:day, h:hours, n:minutes, s:seconds). : is the date-time when the sequence starts, and follow the same format of . : is the date-time when the sequence ends, and follow the same format of . Reference text files in M24 and M48 or inside SF_MViT... folders follows the format "flare_Mclass_.txt", where: : is Seq16 if refers to a sequence, or void if refers direct to images. : "24h" or "48h". : is "TrainVal" or "Test". The refers to the split of Train/Val. : void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. All SF_MViT...folders: Model training codes: "SF_MViT_M+_", where: : void or "oT" (over Train) or "oTV" (over Train and Val) or "oTV_Test" (over Train, Val and Test); : "24h" or "48h"; : "oneSplit" for a specific split or "allSplits" if run all splits. : void is default to run 1 GPU or "2gpu" to run into 2 gpus systems; Job submission files: "jobMViT_", where: : point the queue in Lovelace environment hosted on CENAPAD-SP (https://www.cenapad.unicamp.br/parque/jobsLovelace) Temporary inputs: "Seq16_flare_Mclass_.txt: : train or val; : void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. Outputs: "saida_MViT_Adam_10-7", where: : k0 to k4, means the correlated split of the output, or void if the output is from all splits. Error files: "err_MViT_Adam_10-7", where: : k0 to k4, means the correlated split of the error log file, or void if the error file is from all splits. Checkpoint files: "sample-FLARE_MViT_S_10-7-epoch=-valid_loss=-Wloss_k=.ckpt", where: : epoch number of the checkpoint; : corresponding valid loss; : 0 to 4.

Search
Clear search
Close search
Google apps
Main menu