Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".
Facebook
TwitterThe MELD Preprocessed Dataset is a multi-modal dataset designed for research on emotion recognition from audio, video, and textual data. The dataset builds upon the original MELD dataset and applies extensive preprocessing steps to extract features from different modalities. Each sample is saved as a .pt file containing a dictionary of preprocessed features, making it easy for developers to load and integrate into PyTorch-based workflows.
The preprocessing script performs several key steps:
Text Cleaning:
fix_encoding_with_bytes(text): Decodes text from bytes using UTF-8, Latin-1, or cp1252, ensuring correct encoding.replace_double_encoding(text): Fixes issues related to double-encoded characters (e.g., replacing "Â’" with the proper apostrophe).Audio Processing:
torchaudio.transforms.MelSpectrogram with 64 mel bins (VGGish format).Video Processing:
Saving Processed Samples:
.pt file in a directory structure split by data type (train, dev, and test).dia0_utt1.mp4 becomes dia0_utt1.pt).Each preprocessed sample is stored in a .pt file and contains a dictionary with the following keys:
utterance (str): The cleaned textual utterance.emotion (str/int): The corresponding emotion label.video_path (str): Original path to the video file from which the sample was extracted.audio (Tensor): Raw audio waveform tensor of shape [channels, time].audio_sample_rate (int): The sampling rate of the audio waveform.audio_mel (Tensor): The computed log-scaled Mel-spectrogram with shape [channels, n_mels, time].face (NumPy array): The extracted face image (RGB format) of shape (224, 224, 3). If no face was detected, a default black image is provided.The preprocessed files are organized into splits:
preprocessed_data/
├── train/
│ ├── dia0_utt0.pt
│ ├── dia1_utt1.pt
│ └── ...
├── dev/
│ ├── dia0_utt0.pt
│ ├── dia1_utt1.pt
│ └── ...
└── test/
│ ├── dia0_utt0.pt
│ ├── dia1_utt1.pt
└── ...
A custom PyTorch dataset and DataLoader are provided to facilitate easy integration:
from torch.utils.data import Dataset
import os
import torch
class PreprocessedMELDDataset(Dataset):
def _init_(self, data_dir):
"""
Args:
data_dir (str): Directory where preprocessed .pt files are stored.
"""
self.data_dir = data_dir
self.files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.pt')]
def _len_(self):
return len(self.files)
def _getitem_(self, idx):
sample_path = self.files[idx]
sample = torch.load(sample_path)
return sample
def preprocessed_collate_fn(batch):
"""
Collates a list of sample dictionaries into a single dictionary with keys mapping to lists.
Modify this function to pad or stack tensor data if needed.
"""
collated = {}
collated['utterance'] = [sample['utterance'] for sample in batch]
collated['emotion'] = [sample['emotion'] for sample in batch]
collated['video_path'] = [sample['video_path'] for sample in batch]
collated['audio'] = [sample['audio'] for sample in batch]
collated['audio_sample_rate'] = batch[0]['audio_sample_rate']
collated['audio_mel'] = [sample['audio_mel'] for sample in batch]
collated['face'] = [sample['face'] for sample in batch]
return collated
from torch.utils.data import DataLoader
# Define paths for each split
train_data_dir = "preprocessed_data/train"
dev_data_dir = "preproces...
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
In Deepfake Detection Challenge, many public kernel use the faces at each frame. Here there are all the faces of the tran sample plitted into training set and validation set. Each set has faces from real and fake videos
Feel free the use this dataset to train your models in Deepfake Detection Challenge. If you find this dataset useful, please consider upvoting to it.
The dataset was created by facenet_pytorch library. I used the following code to create the images from a video clip. A part of the code was taken from https://www.kaggle.com/timesler/facial-recognition-model-in-pytorch
` class DetectionPipeline: # Pipeline class for detecting faces in the frames of a video file.
def _init_(self, detector=None, n_frames=None, batch_size=60, resize=None):
"""Constructor for DetectionPipeline class.
Keyword Arguments:
n_frames {int} -- Total number of frames to load. These will be evenly spaced
throughout the video. If not specified (i.e., None), all frames will be loaded.
(default: {None})
batch_size {int} -- Batch size to use with MTCNN face detector. (default: {32})
resize {float} -- Fraction by which to resize frames from original prior to face
detection. A value less than 1 results in downsampling and a value greater than
1 result in upsampling. (default: {None})
"""
self.detector = detector
if detector is None:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
self.detector = MTCNN(image_size=256, margin=40, keep_all=True, factor=0.5, post_process=False, device=device).eval()
self.n_frames = n_frames
self.batch_size = batch_size
self.resize = resize
def _call_(self, filename):
"""Load frames from an MP4 video and detect faces.
Arguments:
filename {str} -- Path to video.
"""
torch.cuda.empty_cache()
faces = []
conf = []
# Create video reader and find length
v_cap = cv2.VideoCapture(filename)
v_len = int(v_cap.get(cv2.CAP_PROP_FRAME_COUNT))
# Pick 'n_frames' evenly spaced frames to sample
if self.n_frames is None:
sample = np.arange(0, v_len)
else:
sample = np.linspace(0, v_len - 1, self.n_frames).astype(int)
# Loop through frames
frames = []
for j in range(v_len):
success = v_cap.grab()
if j in sample:
# Load frame
success, frame = v_cap.retrieve()
if not success:
continue
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame = Image.fromarray(frame)
# Resize frame to desired size
if self.resize is not None:
frame = frame.resize([int(d * self.resize) for d in frame.size])
frames.append(frame)
# When batch is full, detect faces and reset frame list
if len(frames) % self.batch_size == 0 or j == sample[-1]:
curr_faces, curr_conf = self.detector(frames, return_prob=True)
faces.extend(curr_faces)
conf.extend(curr_conf)
frames = []
v_cap.release()
return faces, conf
`
` def save_images(detection_pipeline, filename, out_path): base_name = out_path + '/' + filename.split('/')[-1].split('.')[0] faces, conf = detection_pipeline(filename) for i in range(len(faces)): if faces[i] is None: continue for j in range(len(faces[i])): if conf[i][j] < 0.9: continue out_filename = base_name + '_' + str(i) + '_' + str(j) + '.png' out_img = faces[i][j].cpu().numpy() out_img = np.transpose(out_img, axes=[1, 2, 0]) out_img = cv2.cvtColor(out_img, cv2.COLOR_RGB2BGR) cv2.imwrite(out_filename, out_img)
`
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".