3 datasets found
  1. u

    Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks...

    • rdr.ucl.ac.uk
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin (2023). Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures" [Dataset]. http://doi.org/10.5522/04/14781771.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University College London
    Authors
    Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".

  2. MELD Preprocessed

    • kaggle.com
    zip
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argish Abhangi (2025). MELD Preprocessed [Dataset]. https://www.kaggle.com/datasets/argish/meld-preprocessed
    Explore at:
    zip(3527202381 bytes)Available download formats
    Dataset updated
    Mar 1, 2025
    Authors
    Argish Abhangi
    Description

    The MELD Preprocessed Dataset is a multi-modal dataset designed for research on emotion recognition from audio, video, and textual data. The dataset builds upon the original MELD dataset and applies extensive preprocessing steps to extract features from different modalities. Each sample is saved as a .pt file containing a dictionary of preprocessed features, making it easy for developers to load and integrate into PyTorch-based workflows.

    Data Sources

    • Audio: Waveforms extracted from the original video files.
    • Video: Video files are processed to sample frames at a target frame rate (default: 2 fps) and to detect faces using a Haar Cascade classifier.
    • Text: Utterances from the dialogue, which are cleaned using custom encoding functions to fix potential byte encoding issues.
    • Emotion Labels: Each sample is associated with an emotion label.

    Preprocessing Pipeline

    The preprocessing script performs several key steps:

    1. Text Cleaning:

      • fix_encoding_with_bytes(text): Decodes text from bytes using UTF-8, Latin-1, or cp1252, ensuring correct encoding.
      • replace_double_encoding(text): Fixes issues related to double-encoded characters (e.g., replacing "Â’" with the proper apostrophe).
    2. Audio Processing:

      • Extracts raw audio waveform from each sample.
      • Computes a Mel-spectrogram using torchaudio.transforms.MelSpectrogram with 64 mel bins (VGGish format).
      • Converts the spectrogram to a logarithmic scale for numerical stability.
    3. Video Processing:

      • Reads video frames at a specified target FPS (default: 2 fps) using OpenCV.
      • For each video, samples frames evenly based on the original video's FPS.
      • Applies Haar Cascade face detection on the frames to extract the first detected face.
      • Resizes the detected face to 224x224 and converts it to RGB. If no face is detected, a default black image (224x224x3) is returned.
    4. Saving Processed Samples:

      • Each sample is saved as a .pt file in a directory structure split by data type (train, dev, and test).
      • The filename is derived from the original video filename (e.g., dia0_utt1.mp4 becomes dia0_utt1.pt).

    Data Format

    Each preprocessed sample is stored in a .pt file and contains a dictionary with the following keys:

    • utterance (str): The cleaned textual utterance.
    • emotion (str/int): The corresponding emotion label.
    • video_path (str): Original path to the video file from which the sample was extracted.
    • audio (Tensor): Raw audio waveform tensor of shape [channels, time].
    • audio_sample_rate (int): The sampling rate of the audio waveform.
    • audio_mel (Tensor): The computed log-scaled Mel-spectrogram with shape [channels, n_mels, time].
    • face (NumPy array): The extracted face image (RGB format) of shape (224, 224, 3). If no face was detected, a default black image is provided.

    Directory Structure

    The preprocessed files are organized into splits: preprocessed_data/ ├── train/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... ├── dev/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... └── test/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt └── ...

    Loading and Using the Dataset

    A custom PyTorch dataset and DataLoader are provided to facilitate easy integration:

    Dataset Class

    from torch.utils.data import Dataset
    import os
    import torch
    
    class PreprocessedMELDDataset(Dataset):
      def _init_(self, data_dir):
        """
        Args:
          data_dir (str): Directory where preprocessed .pt files are stored.
        """
        self.data_dir = data_dir
        self.files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.pt')]
        
      def _len_(self):
        return len(self.files)
      
      def _getitem_(self, idx):
        sample_path = self.files[idx]
        sample = torch.load(sample_path)
        return sample
    

    Custom Collate Function

    def preprocessed_collate_fn(batch):
      """
      Collates a list of sample dictionaries into a single dictionary with keys mapping to lists.
      Modify this function to pad or stack tensor data if needed.
      """
      collated = {}
      collated['utterance'] = [sample['utterance'] for sample in batch]
      collated['emotion'] = [sample['emotion'] for sample in batch]
      collated['video_path'] = [sample['video_path'] for sample in batch]
      collated['audio'] = [sample['audio'] for sample in batch]
      collated['audio_sample_rate'] = batch[0]['audio_sample_rate']
      collated['audio_mel'] = [sample['audio_mel'] for sample in batch]
      collated['face'] = [sample['face'] for sample in batch]
      return collated
    

    Creating DataLoaders

    from torch.utils.data import DataLoader
    
    # Define paths for each split
    train_data_dir = "preprocessed_data/train"
    dev_data_dir = "preproces...
    
  3. dfdc faces of the train sample

    • kaggle.com
    zip
    Updated Mar 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Itamar Gilad (2020). dfdc faces of the train sample [Dataset]. https://www.kaggle.com/datasets/itamargr/dfdc-faces-of-the-train-sample
    Explore at:
    zip(3908256789 bytes)Available download formats
    Dataset updated
    Mar 1, 2020
    Authors
    Itamar Gilad
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Overview

    In Deepfake Detection Challenge, many public kernel use the faces at each frame. Here there are all the faces of the tran sample plitted into training set and validation set. Each set has faces from real and fake videos

    Feel free the use this dataset to train your models in Deepfake Detection Challenge. If you find this dataset useful, please consider upvoting to it.

    Creation method

    The dataset was created by facenet_pytorch library. I used the following code to create the images from a video clip. A part of the code was taken from https://www.kaggle.com/timesler/facial-recognition-model-in-pytorch

    ` class DetectionPipeline: # Pipeline class for detecting faces in the frames of a video file.

    def _init_(self, detector=None, n_frames=None, batch_size=60, resize=None):
      """Constructor for DetectionPipeline class.
    
      Keyword Arguments:
        n_frames {int} -- Total number of frames to load. These will be evenly spaced
          throughout the video. If not specified (i.e., None), all frames will be loaded.
          (default: {None})
        batch_size {int} -- Batch size to use with MTCNN face detector. (default: {32})
        resize {float} -- Fraction by which to resize frames from original prior to face
          detection. A value less than 1 results in downsampling and a value greater than
          1 result in upsampling. (default: {None})
      """
      self.detector = detector
      if detector is None:
        device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
        self.detector = MTCNN(image_size=256, margin=40, keep_all=True, factor=0.5, post_process=False, device=device).eval()
      self.n_frames = n_frames
      self.batch_size = batch_size
      self.resize = resize
    
    def _call_(self, filename):
      """Load frames from an MP4 video and detect faces.
    
      Arguments:
        filename {str} -- Path to video.
      """
      torch.cuda.empty_cache()
      faces = []
      conf = []
    
      # Create video reader and find length
      v_cap = cv2.VideoCapture(filename)
      v_len = int(v_cap.get(cv2.CAP_PROP_FRAME_COUNT))
    
      # Pick 'n_frames' evenly spaced frames to sample
      if self.n_frames is None:
        sample = np.arange(0, v_len)
      else:
        sample = np.linspace(0, v_len - 1, self.n_frames).astype(int)
    
      # Loop through frames
      frames = []
      for j in range(v_len):
        success = v_cap.grab()
        if j in sample:
          # Load frame
          success, frame = v_cap.retrieve()
          if not success:
            continue
          frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
          frame = Image.fromarray(frame)
    
          # Resize frame to desired size
          if self.resize is not None:
            frame = frame.resize([int(d * self.resize) for d in frame.size])
          frames.append(frame)
    
          # When batch is full, detect faces and reset frame list
          if len(frames) % self.batch_size == 0 or j == sample[-1]:
            curr_faces, curr_conf = self.detector(frames, return_prob=True)
            faces.extend(curr_faces)
            conf.extend(curr_conf)
            frames = []
    
      v_cap.release()
    
      return faces, conf
    

    `

    ` def save_images(detection_pipeline, filename, out_path): base_name = out_path + '/' + filename.split('/')[-1].split('.')[0] faces, conf = detection_pipeline(filename) for i in range(len(faces)): if faces[i] is None: continue for j in range(len(faces[i])): if conf[i][j] < 0.9: continue out_filename = base_name + '_' + str(i) + '_' + str(j) + '.png' out_img = faces[i][j].cpu().numpy() out_img = np.transpose(out_img, axes=[1, 2, 0]) out_img = cv2.cvtColor(out_img, cv2.COLOR_RGB2BGR) cv2.imwrite(out_filename, out_img)

    `

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin (2023). Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures" [Dataset]. http://doi.org/10.5522/04/14781771.v1

Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures"

Explore at:
zipAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
University College London
Authors
Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".

Search
Clear search
Close search
Google apps
Main menu