100+ datasets found

SELTO Dataset
zenodo.org
data.niaid.nih.gov
application/gzip
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch (2023). SELTO Dataset [Dataset]. http://doi.org/10.5281/zenodo.7781392
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7781392
Dataset updated
May 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Benchmark Dataset for Deep Learning for 3D Topology Optimization

This dataset represents voxelized 3D topology optimization problems and solutions. The solutions have been generated in cooperation with the Ariane Group and Synera using the Altair OptiStruct implementation of SIMP within the Synera software. The SELTO dataset consists of four different 3D datasets for topology optimization, called disc simple, disc complex, sphere simple and sphere complex. Each of these datasets is further split into a training and a validation subset.

The following paper provides full documentation and examples:

Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.

The Python library DL4TO (https://github.com/dl4to/dl4to) can be used to download and access all SELTO dataset subsets.
Each TAR.GZ file container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and contains an associated ground truth solution. Each problem-solution pair consists of two files, where one contains voxel-wise information and the other file contains scalar information. For example, the i-th sample is stored in the files i.csv and i_info.csv, where i.csv contains all voxel-wise information and i_info.csv contains all scalar information. We define all spatially varying quantities at the center of the voxels, rather than on the vertices or surfaces. This allows for a shape-consistent tensor representation.

For the i-th sample, the columns of i_info.csv correspond to the following scalar information:

E - Young's modulus [Pa]

ν - Poisson's ratio [-]

σ_ys - a yield stress [Pa]

h - discretization size of the voxel grid [m]

The columns of i.csv correspond to the following voxel-wise information:

x, y, z - the indices that state the location of the voxel within the voxel mesh

Ω_design - design space information for each voxel. This is a ternary variable that indicates the type of density constraint on the voxel. 0 and 1 indicate that the density is fixed at 0 or 1, respectively. -1 indicates the absence of constraints, i.e., the density in that voxel can be freely optimized

Ω_dirichlet_x, Ω_dirichlet_y, Ω_dirichlet_z - homogeneous Dirichlet boundary conditions for each voxel. These are binary variables that define whether the voxel is subject to homogeneous Dirichlet boundary constraints in the respective dimension

F_x, F_y, F_z - floating point variables that define the three spacial components of external forces applied to each voxel. All forces are body forces given in [N/m^3]

density - defines the binary voxel-wise density of the ground truth solution to the topology optimization problem

How to Import the Dataset

with DL4TO: With the Python library DL4TO (https://github.com/dl4to/dl4to) it is straightforward to download and access the dataset as a customized PyTorch torch.utils.data.Dataset object. As shown in the tutorial this can be done via:

from dl4to.datasets import SELTODataset dataset = SELTODataset(root=root, name=name, train=train)

Here, root is the path where the dataset should be saved. name is the name of the SELTO subset and can be one of "disc_simple", "disc_complex", "sphere_simple" and "sphere_complex". train is a boolean that indicates whether the corresponding training or validation subset should be loaded. See here for further documentation on the SELTODataset class.

without DL4TO: After downloading and unzipping, any of the i.csv files can be manually imported into Python as a Pandas dataframe object:

import pandas as pd root = ... file_path = f'{root}/{i}.csv' columns = ['x', 'y', 'z', 'Ω_design','Ω_dirichlet_x', 'Ω_dirichlet_y', 'Ω_dirichlet_z', 'F_x', 'F_y', 'F_z', 'density'] df = pd.read_csv(file_path, names=columns)

Similarly, we can import a i_info.csv file via:

file_path = f'{root}/{i}_info.csv' info_column_names = ['E', 'ν', 'σ_ys', 'h'] df_info = pd.read_csv(file_path, names=info_columns)

We can extract PyTorch tensors from the Pandas dataframe df using the following function:

import torch def get_torch_tensors_from_dataframe(df, dtype=torch.float32): shape = df[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1 voxels = [df['x'].values, df['y'].values, df['z'].values] Ω_design = torch.zeros(1, *shape, dtype=int) Ω_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['Ω_design'].values.astype(int)) Ω_Dirichlet = torch.zeros(3, *shape, dtype=dtype) Ω_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_x'].values, dtype=dtype) Ω_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_y'].values, dtype=dtype) Ω_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_z'].values, dtype=dtype) F = torch.zeros(3, *shape, dtype=dtype) F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_x'].values, dtype=dtype) F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_y'].values, dtype=dtype) F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_z'].values, dtype=dtype) density = torch.zeros(1, *shape, dtype=dtype) density[:, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['density'].values, dtype=dtype) return Ω_design, Ω_Dirichlet, F, density
Stage Two Experiments - Datasets
figshare.com
bin
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Yerbury (2025). Stage Two Experiments - Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27427629.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27427629.v1
Dataset updated
Jan 21, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Luke Yerbury
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data used in the various stage two experiments in: "Comparing Clustering Approaches for Smart Meter Time Series: Investigating the Influence of Dataset Properties on Performance". This includes datasets with varied characteristics.All datasets are stored in a dict with tuples of (time series array, class labels). To access data in python:import picklefilename = "dataset.txt"with open(filename, 'rb') as f: data = pickle.load(f)
A demo fluorescence dataset in different formats
zenodo.org
bin, json +1
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christophe Pouzat; Christophe Pouzat; Andreas Pippow; Andreas Pippow; Peter kloppenburg; Peter kloppenburg (2024). A demo fluorescence dataset in different formats [Dataset]. http://doi.org/10.5281/zenodo.10518962
Explore at:
json, text/x-python, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10518962
Dataset updated
Jan 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christophe Pouzat; Christophe Pouzat; Andreas Pippow; Andreas Pippow; Peter kloppenburg; Peter kloppenburg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The 4 files contain the same dataset in 4 different formats:

Data_POMC.fits (FITS format).

Data_POMC;json (JSON format with data as nested arrays suitable for direct import in Python).

Data_POMC2.json (JSON format with "flattened" array).

Data_POMC.py (a Python module containing a description and a single variable 'stack' with the data as a 3D NumPy array).

The data are POMC neuron image stack. The CCD chip size (after binning) is 60x80 and 168 fluorescence images were taken. The fluorophore used was Fura-2. Fluorescence images were acquired at 340 nm every 150 ms (exposure time: 12 ms). The imaging setup consisted of an Imago SensiCam CCD camera with a 640x480 chip (Till Photonics, Graefelfing, Germany) and a Polychromator IV (Till Photonics) that was coupled via an optical fiber into the upright microscope. Emitted fluorescence was detected through a 440 nm long-pass filter (LP440). Data were acquired as 80x60 frames using 8x8 on-chip binning. Images were recorded in analog-to-digital units (ADUs) and stored as 12-bit grayscale images. A depolarizing currrent pulse was applied just before frame 13 provoking calcium entry. The data were acquired by Andreas Pippow.
Reference: JOUCLA ET AL. (2013) CELL CALCIUM. 54(2):71-85

To read Data_POMC.fits into a Python session do:

import fitsio
import numpy as np
fits = fitsio.FITS('Data_POMC.fits','r')
fits

To read Data_POMC.py into a Python session do:

import Data_POMC
help(Data_POMC)

To read Data_POMC.json into a Python session do:

import json
import numpy as np
with open("Data_POMC.json","r") as f:
pomc = json.load(f) # pomc is a dictionary
pomc_stack = np.array(pomc['stack'])
print(pomc['metadata'])

To read Data_POMC2.json into a Python session do:

import json
import numpy as np
with open("Data_POMC2.json","r") as f:
pomc = json.load(f) # pomc is a dictionary
pomc_stack = np.reshape(pomc['stack'],(60,80,168),order='f')
print(pomc['metadata'])
T
asset
tensorflow.org
opendatalab.com
Updated May 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). asset [Dataset]. https://www.tensorflow.org/datasets/catalog/asset
Explore at:
Dataset updated
May 7, 2020
Description
ASSET is a dataset for evaluating Sentence Simplification systems with multiple rewriting transformations, as described in "ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations." The corpus is composed of 2000 validation and 359 test original sentences that were each simplified 10 times by different annotators. The corpus also contains human judgments of meaning preservation, fluency and simplicity for the outputs of several automatic text simplification systems.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('asset', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

Annotated 12 lead ECG dataset

zenodo.org

zip

Updated Jun 7, 2021

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Antonio H Ribeiro; Antonio H Ribeiro; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Gabriela M. Paixão; Derick M. Oliveira; Derick M. Oliveira; Paulo R. Gomes; Paulo R. Gomes; Jéssica A. Canazart; Jéssica A. Canazart; Milton P. Ferreira; Milton P. Ferreira; Carl R. Andersson; Carl R. Andersson; Peter W. Macfarlane; Peter W. Macfarlane; Wagner Meira Jr.; Wagner Meira Jr.; Thomas B. Schön; Thomas B. Schön; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2021). Annotated 12 lead ECG dataset [Dataset]. http://doi.org/10.5281/zenodo.3625007

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3625007

Dataset updated

Jun 7, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

# Annotated 12 lead ECG dataset

Contain 827 ECG tracings from different patients, annotated by several cardiologists, residents and medical students.
It is used as test set on the paper:
"Automatic Diagnosis of the Short-Duration12-Lead ECG using a Deep Neural Network".

It contain annotations about 6 different ECGs abnormalities:
- 1st degree AV block (1dAVb);
- right bundle branch block (RBBB);
- left bundle branch block (LBBB);
- sinus bradycardia (SB);
- atrial fibrillation (AF); and,
- sinus tachycardia (ST).

## Folder content:

- `ecg_tracings.hdf5`: HDF5 file containing a single dataset named `tracings`. This dataset is a 
`(827, 4096, 12)` tensor. The first dimension correspond to the 827 different exams from different 
patients; the second dimension correspond to the 4096 signal samples; the third dimension to the 12
different leads of the ECG exam. 

The signals are sampled at 400 Hz. Some signals originally have a duration of 
10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples).
In order to make them all have the same size (4096 samples) we fill them with zeros
on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648
samples at the beginning and 648 samples at the end, yielding 4096 samples that are them saved
in the hdf5 dataset. All signal are represented as floating point numbers at the scale 1e-4V: so it should
be multiplied by 1000 in order to obtain the signals in V.

In python, one can read this file using the following sequence:
```python
import h5py
with h5py.File(args.tracings, "r") as f:
  x = np.array(f['tracings'])
```

- The file `attributes.csv` contain basic patient attributes: sex (M or F) and age. It
contain 827 lines (plus the header). The i-th tracing in `ecg_tracings.hdf5` correspond to the i-th line.
- `annotations/`: folder containing annotations csv format. Each csv file contain 827 lines (plus the header).
The i-th line correspond to the i-th tracing in `ecg_tracings.hdf5` correspond to the in all csv files.
The csv files all have 6 columns `1dAVb, RBBB, LBBB, SB, AF, ST`
corresponding to weather the annotator have detect the abnormality in the ECG (`=1`) or not (`=0`).
 1. `cardiologist[1,2].csv` contain annotations from two different cardiologist.
 2. `gold_standard.csv` gold standard annotation for this test dataset. When the cardiologist 1 and cardiologist 2
 agree, the common diagnosis was considered as gold standard. In cases where there was any disagreement, a 
 third senior specialist, aware of the annotations from the other two, decided the diagnosis. 
 3. `dnn.csv` prediction from the deep neural network described in 
 "Automatic Diagnosis of the Short-Duration 12-Lead ECG using a Deep Neural Network". The threshold is set in such way 
 it maximizes the F1 score.
 4. `cardiology_residents.csv` annotations from two 4th year cardiology residents (each annotated half of the dataset).
 5. `emergency_residents.csv` annotations from two 3rd year emergency residents (each annotated half of the dataset).
 6. `medical_students.csv` annotations from two 5th year medical students (each annotated half of the dataset).

T
unified_qa
tensorflow.org
opendatalab.com
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). unified_qa [Dataset]. https://www.tensorflow.org/datasets/catalog/unified_qa
Explore at:
Dataset updated
Dec 6, 2022
Description
The UnifiedQA benchmark consists of 20 main question answering (QA) datasets (each may have multiple versions) that target different formats as well as various complex linguistic phenomena. These datasets are grouped into several formats/categories, including: extractive QA, abstractive QA, multiple-choice QA, and yes/no QA. Additionally, contrast sets are used for several datasets (denoted with "contrast_sets_"). These evaluation sets are expert-generated perturbations that deviate from the patterns common in the original dataset. For several datasets that do not come with evidence paragraphs, two variants are included: one where the datasets are used as-is and another that uses paragraphs fetched via an information retrieval system as additional evidence, indicated with "_ir" tags.

More information can be found at: https://github.com/allenai/unifiedqa.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('unified_qa', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
CIFAR-10 Python
kaggle.com
zip
Updated Jan 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kris (2018). CIFAR-10 Python [Dataset]. https://www.kaggle.com/pankrzysiu/cifar10-python
Explore at:
zip(340613496 bytes)Available download formats
Dataset updated
Jan 27, 2018
Authors
Kris
Description
Context

CIFAR-10 is the excellent Dataset for many Image processing experiments.

Content

Usage instructions

in Keras

from os import listdir, makedirs from os.path import join, exists, expanduser cache_dir = expanduser(join('~', '.keras')) if not exists(cache_dir): makedirs(cache_dir) datasets_dir = join(cache_dir, 'datasets') # /cifar-10-batches-py if not exists(datasets_dir): makedirs(datasets_dir) # If you have multiple input datasets, change the below cp command accordingly, typically: # !cp ../input/cifar10-python/cifar-10-python.tar.gz ~/.keras/datasets/ !cp ../input/cifar-10-python.tar.gz ~/.keras/datasets/ !ln -s ~/.keras/datasets/cifar-10-python.tar.gz ~/.keras/datasets/cifar-10-batches-py.tar.gz !tar xzvf ~/.keras/datasets/cifar-10-python.tar.gz -C ~/.keras/datasets/

general Python 3

def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict !tar xzvf ../input/cifar-10-python.tar.gz

then see section "Dataset layout" in https://www.cs.toronto.edu/~kriz/cifar.html for details

Acknowledgements

Downloaded directly from here:

https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

See description: https://www.cs.toronto.edu/~kriz/cifar.html

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
T
imagenet2012
tensorflow.org
Updated Jun 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). imagenet2012 [Dataset]. https://www.tensorflow.org/datasets/catalog/imagenet2012
Explore at:
Dataset updated
Jun 1, 2024
Description
ILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.

The test split contains 100K images but no labels because no labels have been publicly released. We provide support for the test split from 2012 with the minor patch released on October 10, 2019. In order to manually download this data, a user must perform the following operations:

Download the 2012 test split available here.

Download the October 10, 2019 patch. There is a Google Drive link to the patch provided on the same page.

Combine the two tar-balls, manually overwriting any images in the original archive with images from the patch. According to the instructions on image-net.org, this procedure overwrites just a few images.

The resulting tar-ball may then be processed by TFDS.

To assess the accuracy of a model on the ImageNet test split, one must run inference on all images in the split, export those results to a text file that must be uploaded to the ImageNet evaluation server. The maintainers of the ImageNet evaluation server permits a single user to submit up to 2 submissions per week in order to prevent overfitting.

To evaluate the accuracy on the test split, one must first create an account at image-net.org. This account must be approved by the site administrator. After the account is created, one can submit the results to the test server at https://image-net.org/challenges/LSVRC/eval_server.php The submission consists of several ASCII text files corresponding to multiple tasks. The task of interest is "Classification submission (top-5 cls error)". A sample of an exported text file looks like the following:

771 778 794 387 650 363 691 764 923 427 737 369 430 531 124 755 930 755 59 168

The export format is described in full in "readme.txt" within the 2013 development kit available here: https://image-net.org/data/ILSVRC/2013/ILSVRC2013_devkit.tgz Please see the section entitled "3.3 CLS-LOC submission format". Briefly, the format of the text file is 100,000 lines corresponding to each image in the test split. Each line of integers correspond to the rank-ordered, top 5 predictions for each test image. The integers are 1-indexed corresponding to the line number in the corresponding labels file. See labels.txt.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('imagenet2012', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/imagenet2012-5.1.0.png" alt="Visualization" width="500px">
H
Advancing Open and Reproducible Water Data Science by Integrating Data...
beta.hydroshare.org
hydroshare.org
zip
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery S. Horsburgh (2024). Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository [Dataset]. https://beta.hydroshare.org/resource/45d3427e794543cfbee129c604d7e865/
Explore at:
zip(50.9 MB)Available download formats
Dataset updated
Jan 9, 2024
Dataset provided by
HydroShare
Authors
Jeffery S. Horsburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
R
Custom Yolov7 On Kaggle On Custom Dataset
universe.roboflow.com
zip
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Jan 29, 2023
Dataset authored and provided by
Owais Ahmad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Car Bounding Boxes
Description
Custom Training with YOLOv7 🔥

Some Important links

Model Inference🤖

🚀Training Yolov7 on Kaggle

Weight and Biases 🐝

HuggingFace 🤗 Model Repo

Contact Information

Name - Owais Ahmad

Phone - +91-9515884381

Email - owaiskhan9654@gmail.com

Portfolio - https://owaiskhan9654.github.io/

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

Link to the Downloadable Dataset

from IPython.display import Markdown, display display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))

Custom Training with YOLOv7 🔥

In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

Export the dataset to YOLOv7

Train YOLOv7 to recognize the objects in our dataset

Evaluate our YOLOv7 model's performance

Run test inference to view performance of YOLOv7 model at work

📦 YOLOv7

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

Image Credit - jinfagang

Step 1: Install Requirements

!git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements %cd yolov7 !pip install -qr requirements.txt !pip install -q roboflow

Downloading YOLOV7 starting checkpoint

!wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"

import os import glob import wandb import torch from roboflow import Roboflow from kaggle_secrets import UserSecretsClient from IPython.display import Image, clear_output, display # to display images print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

YOLOv7-Car-Person-Custom

try: user_secrets = UserSecretsClient() wandb_api_key = user_secrets.get_secret("wandb_api") wandb.login(key=wandb_api_key) anonymous = None except: wandb.login(anonymous='must') print('To use your W&B account, Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. Get your W&B access token from here: https://wandb.ai/authorize') wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")

Step 2: Assemble Our Dataset

https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

In Roboflow, We can choose between two paths:

Convert an existing Coco dataset to YOLOv7 format. In Roboflow it supports over 30 formats object detection formats for conversion.

Uploading only these raw images and annotate them in Roboflow with Roboflow Annotate.

Version v2 Aug 12, 2022 Looks like this.

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

user_secrets = UserSecretsClient() roboflow_api_key = user_secrets.get_secret("roboflow_api")

rf = Roboflow(api_key=roboflow_api_key) project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq") dataset = project.version(2).download("yolov7")

Step 3: Training Custom pretrained YOLOv7 model

Here, I am able to pass a number of arguments: - img: define input image size - batch: determine
Munsell soil color chart: A hyperspectral dataset
zenodo.org
zip
Updated Aug 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riestiya Zain Fadillah; Riestiya Zain Fadillah; Hilda Deborah; Hilda Deborah (2023). Munsell soil color chart: A hyperspectral dataset [Dataset]. http://doi.org/10.5281/zenodo.8143355
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8143355
Dataset updated
Aug 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Riestiya Zain Fadillah; Riestiya Zain Fadillah; Hilda Deborah; Hilda Deborah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains hyperspectral images obtained using SPECIM IQ for the Munsell soil color chart (MSC).

The hyperspectral images are stored in ENVI format. For those who are only interested in the endmember spectra for the MSC, we also provided the spectral library .sli and .hdr inside the endmembers folder.

The acquisition details for each image can be found in the .hdr file and metadata folder inside the whole folder. For the whole image, the acquisition details are:

Table 1. Acquisition details

samples 512
lines 512
bands 204
default bands 70, 53,19
binning 1,1
tint (integration time) 10 (ms)
fps 100
wavelength range 397.32 - 1003.58 nm

The dataset is organized into several folders, each containing different types of datasets.

whole folder contains the entire scene hyperspectral image. This folder contains capture, metadata, and results subfolder.

.png inside the folder is natural color plotting (RGB from default bands in Table. 1) from captured hyperspectral image.

capture folder contains dark reference, white reference and radiance data

metadata folder contains the metadata of the acquisition and device settings.

results contains the reflectance calculated by the device (in .dat, .hdr and rendered natural plotting in .png ) from the hyperspectral camera and .png images of the scene, background, and viewfinder from the device's RGB camera.

chips folder contains only the cropped 20*20 voxels for each color chip reflectances. Each page has its own folder and each folder contains .hdr and .img for each color chip.

endmembers folder contains the spectral library (.sli and .hdr). Each page in MSC have their own .sli and .hdr.

Some of the code snippets that might help to read the dataset

using python spectral library to load the dataset

from spectral import * import matplotlib.pyplot as plt # load the hyperspectral image .hdr and store it to a variable hsi = open_image(PATH) # get the natural RGB plotting of the hyperspectral image using the SPECIM main band hsi_rgb = hsi[:,:,[70,53,19]] # read the spectral library .sli and store it to a variable sli = open_image(PATH) # plot the first endmember plt.plot(sli.spectra[0]) # get the endmembers name sli.names

if you have any question kindly reach me on riestiyf@stud.ntnu.no

Italian Coronavirus Cases by Age group and Sex

kaggle.com

Updated Jul 5, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

janluke (2025). Italian Coronavirus Cases by Age group and Sex [Dataset]. https://www.kaggle.com/giangip/iccas/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 5, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

janluke

Description

Italy Coronavirus Cases by Age group and Sex (ICCAS)

This repository contains datasets about the number of Italian Sars-CoV-2 confirmed cases and deaths disaggregated by age group and sex. The data is (automatically) extracted from pdf reports (like this) published by Istituto Superiore di Sanità (ISS) two times a week. A link to the most recent report can be found in this page under section "Documento esteso".

PDF reports are usually published on Tuesday and Friday and contains data updated to the 4 p.m. of the day day before their release.

I wrote a script that is runned periodically in order to automatically update this repository when a new report is published. The code is hosted in a separate repository.

For feedback and issues refers to the GitHub repository.

Data folder structure

The data folder is structured as follows: data ├── by-date │ └── iccas_{date}.csv Dataset with cases/deaths updated to 4 p.m. of {date} └── iccas_full.csv Dataset with data from all reports (by date) The full dataset is obtained by concatenating all datasets in by-date and has an additional date column. If you use pandas, I suggest you to read this dataset using a multi-index on the first two columns: python import pandas as pd df = pd.read_csv('iccas_full.csv', index_col=(0, 1)) # ('date', 'age_group')

NOTE: {date} is the date the data refers to, NOT the release date of the report it was extracted from: as written above, a report is usually released with a day of delay. For example, iccas_2020-03-19.csv contains data relative to 2020-03-19 which was extracted from the report published in 2020-03-20.

Dataset details

Each dataset in the by-date folder contains the same data you can find in "Table 1" of the corresponding ISS report. This table contains the number of confirmed cases, deaths and other derived information disaggregated by age group (0-9, 10-19, ..., 80-89, >=90) and sex.

WARNING: the sum of male and female cases is not equal to the total number of cases, since the sex of some cases is unknown. The same applies to deaths.

Below, {sex} can be male or female.

Column	Description
`date`	(Only in `iccas_full.csv`) Date the format `YYYY-MM-DD`; numbers are updated to 4 p.m of this date
`age_group`	Values: `"0-9", "10-19", ..., "80-89", ">=90"`
`cases`	Number of confirmed cases (both sexes + unknown-sex; active + closed)
`deaths`	Number of deaths (both sexes + unknown-sex)
`{sex}_cases`	Number of cases of sex {sex}
`{sex}_deaths`	Number of cases of sex {sex} ended up in death
`cases_percentage`	`100 * cases / cases_of_all_ages`
`deaths_percentage`	`100 * deaths / deaths_of_all_ages`
`fatality_rate`	`100 * deaths / cases`
`{sex}_cases_percentage`	`100 * {sex}_cases / (male_cases + female_cases)` (cases of unknown sex excluded)
`{sex}_deaths_percentage`	`100 * {sex}_deaths / (male_deaths + female_deaths)` (cases of unknown sex excluded)
`{sex}_fatality_rate`	`100 * {sex}_deaths / {sex}_cases`

All columns that can be computed from absolute counts of cases and deaths (bottom half of the table above) were all re-computed to increase precision.

h
census-income
huggingface.co
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WC (2025). census-income [Dataset]. https://huggingface.co/datasets/cestwc/census-income
Explore at:
Dataset updated
Jul 21, 2025
Authors
WC
Description
Dataset Card for Census Income (Adult)

This dataset is a precise version of Adult or Census Income. This dataset from UCI somehow happens to occupy two links, but we checked and confirm that they are identical. We used the following python script to create this Hugging Face dataset. import pandas as pd from datasets import Dataset, DatasetDict, Features, Value, ClassLabel

URLs

url1 = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data" url2 =… See the full description on the dataset page: https://huggingface.co/datasets/cestwc/census-income.
Li-ion Battery Aging Dataset
kaggle.com
Updated May 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GIRITHARAN MANI (2024). Li-ion Battery Aging Dataset [Dataset]. https://www.kaggle.com/datasets/mystifoe77/nasa-battery-data-cleaned/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 12, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
GIRITHARAN MANI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Overview

This dataset provides a comprehensive view of the aging process of lithium-ion batteries, facilitating the estimation of their Remaining Useful Life (RUL). Originally sourced from NASA's open repository, the dataset has undergone meticulous preprocessing to enhance its analytical utility. The data is presented in a user-friendly CSV format after extracting relevant features from the original .mat files.

Key Features of the Dataset

Battery Performance Metrics:

Capacity: Measured over time to assess degradation.

Internal Resistance (Re): Represents the electrical resistance of the battery.

Charge Transfer Resistance (Rct): Indicates charge movement efficiency.

Environmental Conditions:

Ambient Temperature: External temperature affecting battery performance.

Identification Attributes:

Battery ID: Unique identifier for each battery tested.

Test ID: Links specific test conditions to outcomes.

UID & Filename: Traceable dataset references.

Processed Data:

Missing values have been addressed.

Columns irrelevant to RUL estimation have been removed.

Skewness in the data has been corrected for statistical accuracy.

Labels:

Degradation States: Categorized into intervals for easier interpretation.

Ranges include operational and failure states.

Potential Applications

Battery Health Monitoring:

Predict battery failure timelines.

Enhance battery maintenance strategies.

Data Science and Machine Learning:

Model development for RUL prediction.

Feature engineering for predictive analysis.

Research and Development:

Improve battery design.

Study the impact of environmental and operational conditions on battery life.

Technical Details

File Format: CSV

Size: ~625.02 kB

Columns: 9

Data Points: Multiple observations across various tests.

Tags

Keywords: Lithium-ion batteries, RUL, Battery Aging, Machine Learning, Data Analysis, Predictive Maintenance.

License

Apache 2.0: Permits academic and commercial use.

Usage Instructions

Import the dataset into your data analysis tools (e.g., Python, R, MATLAB).

Explore features to understand correlations and dependencies.

Use machine learning models for RUL prediction.

Provenance

The dataset was retrieved from NASA's publicly available data repositories. It has been preprocessed to align with research and industrial standards for usability in analytical tasks.

Call to Action

Leverage this dataset to enhance your understanding of lithium-ion battery degradation and build models that could revolutionize energy storage solutions.
wmt22_african
huggingface.co
Updated May 15, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2007). wmt22_african [Dataset]. https://huggingface.co/datasets/allenai/wmt22_african
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 15, 2007
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
Description
Dataset Card for allenai/wmt22_african

Dataset Summary

This dataset was created based on metadata for mined bitext released by Meta AI. It contains bitext for 248 pairs for the African languages that are part of the 2022 WMT Shared Task on Large Scale Machine Translation Evaluation for African Languages.

How to use the data

There are two ways to access the data:

Via the Hugging Face Python datasets library

from datasets import load_dataset dataset =… See the full description on the dataset page: https://huggingface.co/datasets/allenai/wmt22_african.
T
quality
tensorflow.org
Updated Dec 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). quality [Dataset]. https://www.tensorflow.org/datasets/catalog/quality
Explore at:
Dataset updated
Dec 20, 2022
Description
QuALITY, a multiple-choice, long-reading comprehension dataset.

We provide only the raw version.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('quality', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
h
eurosat_dataset
huggingface.co
Updated Oct 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Honaker (2023). eurosat_dataset [Dataset]. https://huggingface.co/datasets/Honaker/eurosat_dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 13, 2023
Authors
Chris Honaker
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for EuroSat

How to Use

Install datasets:

pip install datasets

How to use in Python

from datasets import load_dataset train_data = load_dataset("Honaker/eurosat_dataset", split="train")

Dataset Summary

EuroSat is an image classification dataset with 10 different classes on satellite imagery. There is over 27,000 labeled images.

Dataset Structure

The dataset is structured as follows: DatasetDict({ train: Dataset({… See the full description on the dataset page: https://huggingface.co/datasets/Honaker/eurosat_dataset.
Z
[MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image...
data.niaid.nih.gov
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Shi (2024). [MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification with Multiple Size Options: 28 (MNIST-Like), 64, 128, and 224 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5208229
Explore at:
Dataset updated
Nov 28, 2024
Dataset provided by
Bilian Ke
Jiancheng Yang
Bingbing Ni
Hanspeter Pfister
Zequan Liu
Rui Shi
Donglai Wei
Lin Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code [GitHub] | Publication [Nature Scientific Data'23 / ISBI'21] | Preprint [arXiv]

Abstract

We introduce MedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning. We benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools. The data and code are publicly available at https://medmnist.com/.

Disclaimer: The only official distribution link for the MedMNIST dataset is Zenodo. We kindly request users to refer to this original dataset link for accurate and up-to-date data.

Update: We are thrilled to release MedMNIST+ with larger sizes: 64x64, 128x128, and 224x224 for 2D, and 64x64x64 for 3D. As a complement to the previous 28-size MedMNIST, the large-size version could serve as a standardized benchmark for medical foundation models. Install the latest API to try it out!

Python Usage

We recommend our official code to download, parse and use the MedMNIST dataset:

% pip install medmnist% python

To use the standard 28-size (MNIST-like) version utilizing the downloaded files:

from medmnist import PathMNIST

train_dataset = PathMNIST(split="train")

To enable automatic downloading by setting download=True:

from medmnist import NoduleMNIST3D

val_dataset = NoduleMNIST3D(split="val", download=True)

Alternatively, you can access MedMNIST+ with larger image sizes by specifying the size parameter:

from medmnist import ChestMNIST

test_dataset = ChestMNIST(split="test", download=True, size=224)

Citation

If you find this project useful, please cite both v1 and v2 paper as:

Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, Bingbing Ni. Yang, Jiancheng, et al. "MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification." Scientific Data, 2023.

Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis". IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021.

or using bibtex:

@article{medmnistv2, title={MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification}, author={Yang, Jiancheng and Shi, Rui and Wei, Donglai and Liu, Zequan and Zhao, Lin and Ke, Bilian and Pfister, Hanspeter and Ni, Bingbing}, journal={Scientific Data}, volume={10}, number={1}, pages={41}, year={2023}, publisher={Nature Publishing Group UK London} }

@inproceedings{medmnistv1, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, booktitle={IEEE 18th International Symposium on Biomedical Imaging (ISBI)}, pages={191--195}, year={2021} }

Please also cite the corresponding paper(s) of source data if you use any subset of MedMNIST as per the description on the project website.

License

The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0), except DermaMNIST under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

The code is under Apache-2.0 License.

Changelog

v3.0 (this repository): Released MedMNIST+ featuring larger sizes: 64x64, 128x128, and 224x224 for 2D, and 64x64x64 for 3D.

v2.2: Removed a small number of mistakenly included blank samples in OrganAMNIST, OrganCMNIST, OrganSMNIST, OrganMNIST3D, and VesselMNIST3D.

v2.1: Addressed an issue in the NoduleMNIST3D file (i.e., nodulemnist3d.npz). Further details can be found in this issue.

v2.0: Launched the initial repository of MedMNIST v2, adding 6 datasets for 3D and 2 for 2D.

v1.0: Established the initial repository (in a separate repository) of MedMNIST v1, featuring 10 datasets for 2D.

Note: This dataset is NOT intended for clinical use.
STEAD subsample 4 CDiffSD
zenodo.org
bin
Updated Apr 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniele Trappolini; Daniele Trappolini (2024). STEAD subsample 4 CDiffSD [Dataset]. http://doi.org/10.5281/zenodo.11094536
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11094536
Dataset updated
Apr 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniele Trappolini; Daniele Trappolini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 15, 2024
Description
STEAD Subsample Dataset for CDiffSD Training

Overview

This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four HDF5 files, each saved in a format that requires Python's `h5py` method for opening.

Dataset Files

The dataset includes the following files:

train: Used for both training and validation phases (with validation train split). Contains earthquake ground truth traces.

noise_train: Used for both training and validation phases. Contains noise used to contaminate the traces.

test: Used for the testing phase, structured similarly to train.

noise_test: Used for the testing phase, contains noise data for testing.

Each file is structured to support the training and evaluation of seismic denoising models.

Data

The HDF5 files named noise contain two main datasets:

traces: This dataset includes N number of events, with each event being 6000 in size, representing the length of the traces. Each trace is organized into three channels in the following order: E (East-West), N (North-South), Z (Vertical).

metadata: This dataset contains the names of the traces for each event.

Similarly, the train and test files, which contain earthquake data, include the same traces and metadata datasets, but also feature two additional datasets:

p_arrival: Contains the arrival indices of P-waves, expressed in counts.

s_arrival: Contains the arrival indices of S-waves, also expressed in counts.

Usage

To load these files in a Python environment, use the following approach:

```python

import h5py
import numpy as np

# Open the HDF5 file in read mode
with h5py.File('train_noise.hdf5', 'r') as file:
# Print all the main keys in the file
print("Keys in the HDF5 file:", list(file.keys()))

if 'traces' in file:
# Access the dataset
data = file['traces'][:10] # Load the first 10 traces

if 'metadata' in file:
# Access the dataset
trace_name = file['metadata'][:10] # Load the first 10 metadata entries```

Ensure that the path to the file is correctly specified relative to your Python script.

Requirements

To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:

```bash
pip install numpy
pip install h5py
```
d
MD17 data for graph2mat
data.dtu.dk
txt
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arghya Bhowmik (2024). MD17 data for graph2mat [Dataset]. http://doi.org/10.11583/DTU.26195285.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.26195285.v1
Dataset updated
Aug 6, 2024
Dataset provided by
Technical University of Denmark
Authors
Arghya Bhowmik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Creators

Pol Febrer (pol.febrer@icn2.cat, ORCID 0000-0003-0904-2234) Peter Bjorn Jorgensen (peterbjorgensen@gmail.com, ORCID 0000-0003-4404-7276) Arghya Bhowmik (arbh@dtu.dk, ORCID 0000-0003-3198-5116)

Related publication

The dataset is published as part of the paper: "GRAPH2MAT: UNIVERSAL GRAPH TO MATRIX CONVERSION FOR ELECTRON DENSITY PREDICTION" (https://doi.org/10.26434/chemrxiv-2024-j4g21) https://github.com/BIG-MAP/graph2mat

Short description

This dataset contains the Hamiltonian, Overlap, Density and Energy Density matrices from SIESTA calculations of a subset of the MD17 aspirin dataset. The subset is taken from the third split in (https://doi.org/10.6084/m9.figshare.12672038.v3).

SIESTA 5.0.0 was used to compute the dataset.

Contents

The dataset has two directories:

pseudos: Contains the pseudopotentials used for the calculation (obtained from http://www.pseudo-dojo.org/, type NC SR (ONCVPSP v0.5), PBE, standard accuracy)

splits: The data splits used in the published paper. Each file "splits_X.json" contains the splits for training size X.

And then, three directories containing the calculations with different basis sets: - matrix_dataset_defsplit: Uses the default split-valence DZP basis in SIESTA. - matrix_dataset_optimsplit: Uses a split-valence DZP basis optimized for aspirin. - matrix_dataset_defnodes: Uses the default nodes DZP basis in SIESTA.

Each of the basis directories has two subdirectories: - basis: Contains the files specifying the basis used for each atom. - runs: The results of running the SIESTA simulations. Contents are discussed next.

The "runs" directory contains one directory for each run, named with the index of the run. Each directory contains: - RUN.fdf, geom.fdf: The input files used for the SIESTA calculation. - RUN.out: The log of the SIESTA run, which apar - siesta.TSDE: Contains the Density and Energy Density matrices. - siesta.TSHS: Contains the Hamiltonian and Overlap matrices.

Each matrix can be read using the sisl python package (https://github.com/zerothi/sisl) like:

import sisl matrix = sisl.get_sile("RUN.fdf").read_X()

where X is hamiltonian, overlap, density_matrix or energy_density_matrix.

To reproduce the results presented in the paper, follow the documentation of the graph2mat package (https://github.com/BIG-MAP/graph2mat).

Cite this data

https://doi.org/10.11583/DTU.c.7310005 © 2024 Technical University of Denmark

License

This dataset is published under the CC BY 4.0 license. This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch (2023). SELTO Dataset [Dataset]. http://doi.org/10.5281/zenodo.7781392

SELTO Dataset

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7781392

Dataset updated

May 23, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A Benchmark Dataset for Deep Learning for 3D Topology Optimization

This dataset represents voxelized 3D topology optimization problems and solutions. The solutions have been generated in cooperation with the Ariane Group and Synera using the Altair OptiStruct implementation of SIMP within the Synera software. The SELTO dataset consists of four different 3D datasets for topology optimization, called disc simple, disc complex, sphere simple and sphere complex. Each of these datasets is further split into a training and a validation subset.

The following paper provides full documentation and examples:

Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.

The Python library DL4TO (https://github.com/dl4to/dl4to) can be used to download and access all SELTO dataset subsets.
Each TAR.GZ file container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and contains an associated ground truth solution. Each problem-solution pair consists of two files, where one contains voxel-wise information and the other file contains scalar information. For example, the i-th sample is stored in the files i.csv and i_info.csv, where i.csv contains all voxel-wise information and i_info.csv contains all scalar information. We define all spatially varying quantities at the center of the voxels, rather than on the vertices or surfaces. This allows for a shape-consistent tensor representation.

For the i-th sample, the columns of i_info.csv correspond to the following scalar information:

E - Young's modulus [Pa]
ν - Poisson's ratio [-]
σ_ys - a yield stress [Pa]
h - discretization size of the voxel grid [m]

The columns of i.csv correspond to the following voxel-wise information:

x, y, z - the indices that state the location of the voxel within the voxel mesh
Ω_design - design space information for each voxel. This is a ternary variable that indicates the type of density constraint on the voxel. 0 and 1 indicate that the density is fixed at 0 or 1, respectively. -1 indicates the absence of constraints, i.e., the density in that voxel can be freely optimized
Ω_dirichlet_x, Ω_dirichlet_y, Ω_dirichlet_z - homogeneous Dirichlet boundary conditions for each voxel. These are binary variables that define whether the voxel is subject to homogeneous Dirichlet boundary constraints in the respective dimension
F_x, F_y, F_z - floating point variables that define the three spacial components of external forces applied to each voxel. All forces are body forces given in [N/m^3]
density - defines the binary voxel-wise density of the ground truth solution to the topology optimization problem

How to Import the Dataset

with DL4TO: With the Python library DL4TO (https://github.com/dl4to/dl4to) it is straightforward to download and access the dataset as a customized PyTorch torch.utils.data.Dataset object. As shown in the tutorial this can be done via:

from dl4to.datasets import SELTODataset

dataset = SELTODataset(root=root, name=name, train=train)

Here, root is the path where the dataset should be saved. name is the name of the SELTO subset and can be one of "disc_simple", "disc_complex", "sphere_simple" and "sphere_complex". train is a boolean that indicates whether the corresponding training or validation subset should be loaded. See here for further documentation on the SELTODataset class.

without DL4TO: After downloading and unzipping, any of the i.csv files can be manually imported into Python as a Pandas dataframe object:

import pandas as pd

root = ...
file_path = f'{root}/{i}.csv'
columns = ['x', 'y', 'z', 'Ω_design','Ω_dirichlet_x', 'Ω_dirichlet_y', 'Ω_dirichlet_z', 'F_x', 'F_y', 'F_z', 'density']
df = pd.read_csv(file_path, names=columns)

Similarly, we can import a i_info.csv file via:

file_path = f'{root}/{i}_info.csv'
info_column_names = ['E', 'ν', 'σ_ys', 'h']
df_info = pd.read_csv(file_path, names=info_columns)

We can extract PyTorch tensors from the Pandas dataframe df using the following function:

import torch

def get_torch_tensors_from_dataframe(df, dtype=torch.float32):
  shape = df[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1
  voxels = [df['x'].values, df['y'].values, df['z'].values]

  Ω_design = torch.zeros(1, *shape, dtype=int)
  Ω_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['Ω_design'].values.astype(int))

  Ω_Dirichlet = torch.zeros(3, *shape, dtype=dtype)
  Ω_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_x'].values, dtype=dtype)
  Ω_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_y'].values, dtype=dtype)
  Ω_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_z'].values, dtype=dtype)

  F = torch.zeros(3, *shape, dtype=dtype)
  F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_x'].values, dtype=dtype)
  F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_y'].values, dtype=dtype)
  F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_z'].values, dtype=dtype)

  density = torch.zeros(1, *shape, dtype=dtype)
  density[:, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['density'].values, dtype=dtype)

  return Ω_design, Ω_Dirichlet, F, density

Clear search

Close search

Google apps

Main menu

samples	512
lines	512
bands	204
default bands	70, 53,19
binning	1,1
tint (integration time)	10 (ms)
fps	100
wavelength range	397.32 - 1003.58 nm

SELTO Dataset

Stage Two Experiments - Datasets

A demo fluorescence dataset in different formats

asset

Annotated 12 lead ECG dataset

unified_qa

CIFAR-10 Python

Context

Content

in Keras

general Python 3

Acknowledgements

Inspiration

imagenet2012

Advancing Open and Reproducible Water Data Science by Integrating Data...

Custom Yolov7 On Kaggle On Custom Dataset

Custom Training with YOLOv7 🔥

Some Important links

Contact Information

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

Custom Training with YOLOv7 🔥

📦 YOLOv7

Step 1: Install Requirements

Downloading YOLOV7 starting checkpoint

Step 2: Assemble Our Dataset

Version v2 Aug 12, 2022 Looks like this.

Step 3: Training Custom pretrained YOLOv7 model

Munsell soil color chart: A hyperspectral dataset

Italian Coronavirus Cases by Age group and Sex

Italy Coronavirus Cases by Age group and Sex (ICCAS)

Data folder structure

Dataset details

census-income

URLs

Li-ion Battery Aging Dataset

Dataset Overview

Key Features of the Dataset

Potential Applications

Technical Details

Tags

License

Usage Instructions

Provenance

Call to Action

wmt22_african

quality

eurosat_dataset

[MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image...

STEAD subsample 4 CDiffSD

STEAD Subsample Dataset for CDiffSD Training

Overview

Dataset Files

Data

Usage

Requirements

MD17 data for graph2mat

Creators

Related publication

Short description

Contents

Cite this data

License

SELTO Dataset