3 datasets found

u
Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks...
rdr.ucl.ac.uk
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin (2023). Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures" [Dataset]. http://doi.org/10.5522/04/14781771.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5522/04/14781771.v1
Dataset updated
May 31, 2023
Dataset provided by
University College London
Authors
Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".
MELD Preprocessed
kaggle.com
zip
Updated Mar 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argish Abhangi (2025). MELD Preprocessed [Dataset]. https://www.kaggle.com/datasets/argish/meld-preprocessed
Explore at:
zip(3527202381 bytes)Available download formats
Dataset updated
Mar 1, 2025
Authors
Argish Abhangi
Description
The MELD Preprocessed Dataset is a multi-modal dataset designed for research on emotion recognition from audio, video, and textual data. The dataset builds upon the original MELD dataset and applies extensive preprocessing steps to extract features from different modalities. Each sample is saved as a .pt file containing a dictionary of preprocessed features, making it easy for developers to load and integrate into PyTorch-based workflows.

Data Sources

Audio: Waveforms extracted from the original video files.

Video: Video files are processed to sample frames at a target frame rate (default: 2 fps) and to detect faces using a Haar Cascade classifier.

Text: Utterances from the dialogue, which are cleaned using custom encoding functions to fix potential byte encoding issues.

Emotion Labels: Each sample is associated with an emotion label.

Preprocessing Pipeline

The preprocessing script performs several key steps:

Text Cleaning:

fix_encoding_with_bytes(text): Decodes text from bytes using UTF-8, Latin-1, or cp1252, ensuring correct encoding.

replace_double_encoding(text): Fixes issues related to double-encoded characters (e.g., replacing "Â’" with the proper apostrophe).

Audio Processing:

Extracts raw audio waveform from each sample.

Computes a Mel-spectrogram using torchaudio.transforms.MelSpectrogram with 64 mel bins (VGGish format).

Converts the spectrogram to a logarithmic scale for numerical stability.

Video Processing:

Reads video frames at a specified target FPS (default: 2 fps) using OpenCV.

For each video, samples frames evenly based on the original video's FPS.

Applies Haar Cascade face detection on the frames to extract the first detected face.

Resizes the detected face to 224x224 and converts it to RGB. If no face is detected, a default black image (224x224x3) is returned.

Saving Processed Samples:

Each sample is saved as a .pt file in a directory structure split by data type (train, dev, and test).

The filename is derived from the original video filename (e.g., dia0_utt1.mp4 becomes dia0_utt1.pt).

Data Format

Each preprocessed sample is stored in a .pt file and contains a dictionary with the following keys:

utterance (str): The cleaned textual utterance.

emotion (str/int): The corresponding emotion label.

video_path (str): Original path to the video file from which the sample was extracted.

audio (Tensor): Raw audio waveform tensor of shape [channels, time].

audio_sample_rate (int): The sampling rate of the audio waveform.

audio_mel (Tensor): The computed log-scaled Mel-spectrogram with shape [channels, n_mels, time].

face (NumPy array): The extracted face image (RGB format) of shape (224, 224, 3). If no face was detected, a default black image is provided.

Directory Structure

The preprocessed files are organized into splits: preprocessed_data/ ├── train/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... ├── dev/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... └── test/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt └── ...

Loading and Using the Dataset

A custom PyTorch dataset and DataLoader are provided to facilitate easy integration:

Dataset Class

from torch.utils.data import Dataset import os import torch class PreprocessedMELDDataset(Dataset): def _init_(self, data_dir): """ Args: data_dir (str): Directory where preprocessed .pt files are stored. """ self.data_dir = data_dir self.files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.pt')] def _len_(self): return len(self.files) def _getitem_(self, idx): sample_path = self.files[idx] sample = torch.load(sample_path) return sample

Custom Collate Function

def preprocessed_collate_fn(batch): """ Collates a list of sample dictionaries into a single dictionary with keys mapping to lists. Modify this function to pad or stack tensor data if needed. """ collated = {} collated['utterance'] = [sample['utterance'] for sample in batch] collated['emotion'] = [sample['emotion'] for sample in batch] collated['video_path'] = [sample['video_path'] for sample in batch] collated['audio'] = [sample['audio'] for sample in batch] collated['audio_sample_rate'] = batch[0]['audio_sample_rate'] collated['audio_mel'] = [sample['audio_mel'] for sample in batch] collated['face'] = [sample['face'] for sample in batch] return collated

Creating DataLoaders

from torch.utils.data import DataLoader # Define paths for each split train_data_dir = "preprocessed_data/train" dev_data_dir = "preproces...

dfdc faces of the train sample

kaggle.com

zip

Updated Mar 1, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Itamar Gilad (2020). dfdc faces of the train sample [Dataset]. https://www.kaggle.com/datasets/itamargr/dfdc-faces-of-the-train-sample

Explore at:

zip(3908256789 bytes)Available download formats

Dataset updated

Mar 1, 2020

Authors

Itamar Gilad

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

Overview

In Deepfake Detection Challenge, many public kernel use the faces at each frame. Here there are all the faces of the tran sample plitted into training set and validation set. Each set has faces from real and fake videos

Feel free the use this dataset to train your models in Deepfake Detection Challenge. If you find this dataset useful, please consider upvoting to it.

Creation method

The dataset was created by facenet_pytorch library. I used the following code to create the images from a video clip. A part of the code was taken from https://www.kaggle.com/timesler/facial-recognition-model-in-pytorch

` class DetectionPipeline: # Pipeline class for detecting faces in the frames of a video file.

def _init_(self, detector=None, n_frames=None, batch_size=60, resize=None):
  """Constructor for DetectionPipeline class.

  Keyword Arguments:
    n_frames {int} -- Total number of frames to load. These will be evenly spaced
      throughout the video. If not specified (i.e., None), all frames will be loaded.
      (default: {None})
    batch_size {int} -- Batch size to use with MTCNN face detector. (default: {32})
    resize {float} -- Fraction by which to resize frames from original prior to face
      detection. A value less than 1 results in downsampling and a value greater than
      1 result in upsampling. (default: {None})
  """
  self.detector = detector
  if detector is None:
    device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
    self.detector = MTCNN(image_size=256, margin=40, keep_all=True, factor=0.5, post_process=False, device=device).eval()
  self.n_frames = n_frames
  self.batch_size = batch_size
  self.resize = resize

def _call_(self, filename):
  """Load frames from an MP4 video and detect faces.

  Arguments:
    filename {str} -- Path to video.
  """
  torch.cuda.empty_cache()
  faces = []
  conf = []

  # Create video reader and find length
  v_cap = cv2.VideoCapture(filename)
  v_len = int(v_cap.get(cv2.CAP_PROP_FRAME_COUNT))

  # Pick 'n_frames' evenly spaced frames to sample
  if self.n_frames is None:
    sample = np.arange(0, v_len)
  else:
    sample = np.linspace(0, v_len - 1, self.n_frames).astype(int)

  # Loop through frames
  frames = []
  for j in range(v_len):
    success = v_cap.grab()
    if j in sample:
      # Load frame
      success, frame = v_cap.retrieve()
      if not success:
        continue
      frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
      frame = Image.fromarray(frame)

      # Resize frame to desired size
      if self.resize is not None:
        frame = frame.resize([int(d * self.resize) for d in frame.size])
      frames.append(frame)

      # When batch is full, detect faces and reset frame list
      if len(frames) % self.batch_size == 0 or j == sample[-1]:
        curr_faces, curr_conf = self.detector(frames, return_prob=True)
        faces.extend(curr_faces)
        conf.extend(curr_conf)
        frames = []

  v_cap.release()

  return faces, conf

` def save_images(detection_pipeline, filename, out_path): base_name = out_path + '/' + filename.split('/')[-1].split('.')[0] faces, conf = detection_pipeline(filename) for i in range(len(faces)): if faces[i] is None: continue for j in range(len(faces[i])): if conf[i][j] < 0.9: continue out_filename = base_name + '_' + str(i) + '_' + str(j) + '.png' out_img = faces[i][j].cpu().numpy() out_img = np.transpose(out_img, axes=[1, 2, 0]) out_img = cv2.cvtColor(out_img, cv2.COLOR_RGB2BGR) cv2.imwrite(out_filename, out_img)

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin (2023). Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures" [Dataset]. http://doi.org/10.5522/04/14781771.v1

Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures"

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5522/04/14781771.v1

Dataset updated

May 31, 2023

Dataset provided by

University College London

Authors

Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".

Clear search

Close search

Google apps

Main menu

Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks...

MELD Preprocessed

Data Sources

Preprocessing Pipeline

Data Format

Directory Structure

Loading and Using the Dataset

Dataset Class

Custom Collate Function

Creating DataLoaders

dfdc faces of the train sample

Overview

Creation method

Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures"