2 datasets found
  1. z

    Data from: LSE-Health-UVigo

    • zenodo.org
    • portalcientifico.uvigo.gal
    Updated Sep 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose L. Alba-Castro; Jose L. Alba-Castro; Manuel Vázquez-Enríquez; Manuel Vázquez-Enríquez; Ania Pérez-Pérez; Flora Mariño-Pérez; Manuel L Lema-Álvarez; Carmen Cabeza-Pereiro; Carmen Cabeza-Pereiro; Eduardo Rodríguez-Banga; Laura Docío-Fernández; Soledad Torres-Guijarro; Soledad Torres-Guijarro; Alba Caderno-Fernández; Sol Cid-Álvarez; Ania Pérez-Pérez; Flora Mariño-Pérez; Manuel L Lema-Álvarez; Eduardo Rodríguez-Banga; Laura Docío-Fernández; Alba Caderno-Fernández; Sol Cid-Álvarez (2024). LSE-Health-UVigo [Dataset]. http://doi.org/10.5281/zenodo.10234465
    Explore at:
    Dataset updated
    Sep 5, 2024
    Dataset provided by
    Zenodo
    Authors
    Jose L. Alba-Castro; Jose L. Alba-Castro; Manuel Vázquez-Enríquez; Manuel Vázquez-Enríquez; Ania Pérez-Pérez; Flora Mariño-Pérez; Manuel L Lema-Álvarez; Carmen Cabeza-Pereiro; Carmen Cabeza-Pereiro; Eduardo Rodríguez-Banga; Laura Docío-Fernández; Soledad Torres-Guijarro; Soledad Torres-Guijarro; Alba Caderno-Fernández; Sol Cid-Álvarez; Ania Pérez-Pérez; Flora Mariño-Pérez; Manuel L Lema-Álvarez; Eduardo Rodríguez-Banga; Laura Docío-Fernández; Alba Caderno-Fernández; Sol Cid-Álvarez
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    LSE-Health-UVigo Dataset

    The LSE-Health-UVigo dataset is a collection of 273 videos focused on health-related topics, presented in Spanish Sign Language (Lengua de Signos Española, LSE). The dataset offers comprehensive annotations and alignments for various linguistic elements within the videos.

    Overview

    • Total Videos: 273
    • Total Duration: ~11 hours

    The dataset was acquired in studio conditions with blue chroma-key, no shadow effects and uniform illumination, at 25 fps and FHD. The added value of the dataset is he rich and rigorous hand-made annotations. Experts interpreters and deaf people were in charge of annotating the dataset with strict criteria explained below. A previous version of this dataset with less videos and annotations was distributed for the 2022 Sign Spotting Challenge at ECCV. The description of the former dataset, LSE_eSaude_UVIGO (ECCV'22), can be found here, including the train/val/test downloadable splits for the two organized tracks (MSSL-multiple shot supervised learning, and OSLWL-one shot learning and weak labels). The results of the challenge with the description of the dataset, protocols and baseline models, as well as discussing top-winning solutions and future directions on the topic can be found in the ECCV'2022 paper.

    Annotations

    1. Translation from LSE to Spanish

    • Description: Each video is translated into Spanish, segmented and aligned into sentences and smaller segments.
    • Total Segments: 7,738

    2. Sign Annotations

    • Description: 105 distinct signs annotated across all 273 videos.
    • Total Instances: 15,098

    3. Fingerspelling Annotations

    • Description: Accurate location and annotation of all fingerspelled words.
    • Total Instances: 1,029

    Signers

    • Total Signers: 10 (7 women, 3 men)
      • Deaf Signers: 7 (4 women, 3 men)
      • Hearing Signers: 3 (all women)

    Usage

    Researchers and practitioners in machine translation, linguistics, healthcare, and sign language interpretation may find this dataset valuable for:

    • Training/testing machine learning models for sign language translation, sign spotting, isolated sign language recognition and fingerspelling detection/recognition.
    • Studying health-related sign language communication
    • Analyzing linguistic patterns and structures within sign language

    Distribution

    • Excel file: Includes the links to download the videos from youtube, meta data and all the annotations (segments, glosses and fingerspelled words)
    • 273 annotation files using the ELAN program: These files contain 3 Tiers explained below
    • 273 video files: spanning ~11 hours of topics related to diseases, symptoms, treatment, care, etc.

    Annotations:

    LSE-Health-UVigo has been annotated with the ELAN program. Annotators used three Tiers:

    • Tier 'M_Glosa' for the location of the 105 selected glosses.
    • Associated Tier 'Var' for annotating variants of the signed gloss. 3-letter codes have been defined to note 8 types of sign variations. These variations are:
      • linguistic, like slight modifications of the sign due to relaxed execution (coded LAX), slight change of location in space (LOC), abnormal use of the non-dominant hand (MAN) and morphology changes as in distributed plurals (MPH), or
      • non-linguistic, like very short sign due to speed and large coarticulation SHO), partial occlusion of the sign with other body parts (OCC) or because out of the frame (OUT) and other gloss with similar signed appearance to the selected gloss (SIM). For the sake of completeness, these variations are not filtered out in the distribution, just informed.
    • Tier 'Trad' for the translation into Spanish and annotation of start and end of each sentence or partial sentence. Translation of a visual language to a written language is not a trivial task. Two-letter codes have been defined to note 8 types of special events regarding visual-to-written translation. The next subsection explains the strict annotation criteria.

    Annotation criteria:

    The annotation criteria shared by all the annotators (4) were as follows:

    For Glosses and fingerspelled words:

    • The begin_timestamp for a sign is set as soon as the parameters hand configuration, palm orientation and movement and/or location correspond to that sign more than to the transition from the previous one.
    • The end_timestamp for a sign is set as soon as the parameters hand configuration, palm orientation and movement and/or location start to change to a transition to the next sign.
    • As far as possible, transitions are not included in the annotated intervals.
    • A star '*' prefix indicate that there's slight drift from normal realization or that there's a OOV sign very similar visually. The reasons can be linguistic (MAN, LOC, MPH, LAX) or not linguistic (SHO, OCC, OUT, SIM)
    • It is important to highlight the annotation of the plurals. In Spanish sign Language, plurals use to be performed by sign repetition. Annotations for plurals are coded as a single interval comprising all the concatenated repetitions if no other parameter is modified. The only special case corresponds to the glosses PERSON and its plural PERSON(M-RE), that are coded with different gloss-class because the second repetition use to be smaller and relaxed, as a rebound without changing the hand configuration.

    For Translation: The general criterion for segmentation (performed by a professional interpreter involved also in the recordings) was to adapt the text in OL (oral language) to resemble as closely as possible the signed LSE (Spanish Sign Language) and to segment complete phrases, or smaller particles if it results in more semantically coherent segments. Additionally, due to discursive and grammatical differences between OL and LSE, 8 specific types of annotations were defined and marked in brackets:

    • If the text in OL has a visual but not literal correspondence in LSE, it will be marked with the code "V" (visual) in brackets [V:original text].
    • When discursive markers are used in LSE that do not usually appear in the text, [MD:type-MD] is inserted in the corresponding text: [MD:one], [MD:two], [MD:three], etc. [MD:theme], [MD:alternative], [MD:section], [MD:next], etc.
    • When the phrase in LSE is affected by bimodal signing (restricted by the grammatical order in the text), it is marked as [B: and the text signed bimodally]. This phenomenon is very typical in Sign Language when the person is reading and translating in real-time, as in broadcasting.
    • When fingerspelling is used and/or the corresponding sign is given, it will be translated literally. For example, in tier 'Trad': "The disease called [DL:GALACTORREA], its sign is [GL:GALACTORREA]" is signed in LSE as (gloss-type notation): DISEASE NAME G-A-L-A-C-T-O-RR-E-A (fingerspelled), SIGN cl:fluid-chest (visually explained).
    • If a semantic confusion occurs when signing a specific sign, the substitution code is used [S: "Spanish text - GLOSS of the sign actually performed"]. For example, the text says "There are many species:" and in the signed video, the sign for "culinary spices" is used, therefore [S: species - SPICES] is inserted.
    • When a deictic is used that replaces part of the text because they have previously located what they are mentioning there, and it is necessary for the sentence to make sense: [VD:"text replaced by the deictic"].
    • When a sign is temporarily defined for that discourse context, [T: signed word] is used.

    Acknowledgments

    This dataset is a collaborative effort of the next research goups and entities:

    Gratitude is extended to them for their contributions and support.

  2. Snapture - A Novel Neural Architecture for Combined Static and Dynamic Hand...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hassan Ali; Doreen Jirak; Stefan Wermter; Hassan Ali; Doreen Jirak; Stefan Wermter (2024). Snapture - A Novel Neural Architecture for Combined Static and Dynamic Hand Gesture Recognition:TERAIS Data [Dataset]. http://doi.org/10.5281/zenodo.10693817
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hassan Ali; Doreen Jirak; Stefan Wermter; Hassan Ali; Doreen Jirak; Stefan Wermter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository lists the datasets used for developing the experiments of the paper titled: Snapture - A Novel Neural Architecture for Combined Static and Dynamic Hand Gesture Recognition (Open access) by Hassan Ali, Doreen Jirak and Stefan Wermter. The study was conducted in the Knowledge Technology (WTM) group at the University of Hamburg.

    GRIT Robot Commands Dataset

    This dataset is was recorded at the Knowledge Technology (WTM) group at the University of Hamburg and can be requested here. The dataset is public and was not collected as part of this study.

    Montalbano Co-Speech Dataset

    This dataset was recorded as part of the ChaLearn Looking at People Challenge and can be downloaded from here. The dataset is public and was not collected as part of this study. The attached montalbano_segments.csv file can be used to create gesture segmentations of the test subset of this dataset.

    Acknow ledgements

    This work was partially supported by the DFG under project CML (TRR 169) and BMWK under project KI-SIGS and EU under project TERAIS.



    Citation

    To cite our paper, you can copy the following into your .bib file

    @Article{Ali2023,
    author={Ali, Hassan and Jirak, Doreen and Wermter, Stefan},
    title={Snapture---a Novel Neural Architecture for Combined Static and Dynamic Hand Gesture Recognition},
    journal={Cognitive Computation},
    year={2023},
    month={Jul},
    day={17},
    issn={1866-9964},
    doi={10.1007/s12559-023-10174-z},
    url={https://doi.org/10.1007/s12559-023-10174-z}
    }
  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jose L. Alba-Castro; Jose L. Alba-Castro; Manuel Vázquez-Enríquez; Manuel Vázquez-Enríquez; Ania Pérez-Pérez; Flora Mariño-Pérez; Manuel L Lema-Álvarez; Carmen Cabeza-Pereiro; Carmen Cabeza-Pereiro; Eduardo Rodríguez-Banga; Laura Docío-Fernández; Soledad Torres-Guijarro; Soledad Torres-Guijarro; Alba Caderno-Fernández; Sol Cid-Álvarez; Ania Pérez-Pérez; Flora Mariño-Pérez; Manuel L Lema-Álvarez; Eduardo Rodríguez-Banga; Laura Docío-Fernández; Alba Caderno-Fernández; Sol Cid-Álvarez (2024). LSE-Health-UVigo [Dataset]. http://doi.org/10.5281/zenodo.10234465

Data from: LSE-Health-UVigo

Related Article
Explore at:
Dataset updated
Sep 5, 2024
Dataset provided by
Zenodo
Authors
Jose L. Alba-Castro; Jose L. Alba-Castro; Manuel Vázquez-Enríquez; Manuel Vázquez-Enríquez; Ania Pérez-Pérez; Flora Mariño-Pérez; Manuel L Lema-Álvarez; Carmen Cabeza-Pereiro; Carmen Cabeza-Pereiro; Eduardo Rodríguez-Banga; Laura Docío-Fernández; Soledad Torres-Guijarro; Soledad Torres-Guijarro; Alba Caderno-Fernández; Sol Cid-Álvarez; Ania Pérez-Pérez; Flora Mariño-Pérez; Manuel L Lema-Álvarez; Eduardo Rodríguez-Banga; Laura Docío-Fernández; Alba Caderno-Fernández; Sol Cid-Álvarez
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

LSE-Health-UVigo Dataset

The LSE-Health-UVigo dataset is a collection of 273 videos focused on health-related topics, presented in Spanish Sign Language (Lengua de Signos Española, LSE). The dataset offers comprehensive annotations and alignments for various linguistic elements within the videos.

Overview

  • Total Videos: 273
  • Total Duration: ~11 hours

The dataset was acquired in studio conditions with blue chroma-key, no shadow effects and uniform illumination, at 25 fps and FHD. The added value of the dataset is he rich and rigorous hand-made annotations. Experts interpreters and deaf people were in charge of annotating the dataset with strict criteria explained below. A previous version of this dataset with less videos and annotations was distributed for the 2022 Sign Spotting Challenge at ECCV. The description of the former dataset, LSE_eSaude_UVIGO (ECCV'22), can be found here, including the train/val/test downloadable splits for the two organized tracks (MSSL-multiple shot supervised learning, and OSLWL-one shot learning and weak labels). The results of the challenge with the description of the dataset, protocols and baseline models, as well as discussing top-winning solutions and future directions on the topic can be found in the ECCV'2022 paper.

Annotations

1. Translation from LSE to Spanish

  • Description: Each video is translated into Spanish, segmented and aligned into sentences and smaller segments.
  • Total Segments: 7,738

2. Sign Annotations

  • Description: 105 distinct signs annotated across all 273 videos.
  • Total Instances: 15,098

3. Fingerspelling Annotations

  • Description: Accurate location and annotation of all fingerspelled words.
  • Total Instances: 1,029

Signers

  • Total Signers: 10 (7 women, 3 men)
    • Deaf Signers: 7 (4 women, 3 men)
    • Hearing Signers: 3 (all women)

Usage

Researchers and practitioners in machine translation, linguistics, healthcare, and sign language interpretation may find this dataset valuable for:

  • Training/testing machine learning models for sign language translation, sign spotting, isolated sign language recognition and fingerspelling detection/recognition.
  • Studying health-related sign language communication
  • Analyzing linguistic patterns and structures within sign language

Distribution

  • Excel file: Includes the links to download the videos from youtube, meta data and all the annotations (segments, glosses and fingerspelled words)
  • 273 annotation files using the ELAN program: These files contain 3 Tiers explained below
  • 273 video files: spanning ~11 hours of topics related to diseases, symptoms, treatment, care, etc.

Annotations:

LSE-Health-UVigo has been annotated with the ELAN program. Annotators used three Tiers:

  • Tier 'M_Glosa' for the location of the 105 selected glosses.
  • Associated Tier 'Var' for annotating variants of the signed gloss. 3-letter codes have been defined to note 8 types of sign variations. These variations are:
    • linguistic, like slight modifications of the sign due to relaxed execution (coded LAX), slight change of location in space (LOC), abnormal use of the non-dominant hand (MAN) and morphology changes as in distributed plurals (MPH), or
    • non-linguistic, like very short sign due to speed and large coarticulation SHO), partial occlusion of the sign with other body parts (OCC) or because out of the frame (OUT) and other gloss with similar signed appearance to the selected gloss (SIM). For the sake of completeness, these variations are not filtered out in the distribution, just informed.
  • Tier 'Trad' for the translation into Spanish and annotation of start and end of each sentence or partial sentence. Translation of a visual language to a written language is not a trivial task. Two-letter codes have been defined to note 8 types of special events regarding visual-to-written translation. The next subsection explains the strict annotation criteria.

Annotation criteria:

The annotation criteria shared by all the annotators (4) were as follows:

For Glosses and fingerspelled words:

  • The begin_timestamp for a sign is set as soon as the parameters hand configuration, palm orientation and movement and/or location correspond to that sign more than to the transition from the previous one.
  • The end_timestamp for a sign is set as soon as the parameters hand configuration, palm orientation and movement and/or location start to change to a transition to the next sign.
  • As far as possible, transitions are not included in the annotated intervals.
  • A star '*' prefix indicate that there's slight drift from normal realization or that there's a OOV sign very similar visually. The reasons can be linguistic (MAN, LOC, MPH, LAX) or not linguistic (SHO, OCC, OUT, SIM)
  • It is important to highlight the annotation of the plurals. In Spanish sign Language, plurals use to be performed by sign repetition. Annotations for plurals are coded as a single interval comprising all the concatenated repetitions if no other parameter is modified. The only special case corresponds to the glosses PERSON and its plural PERSON(M-RE), that are coded with different gloss-class because the second repetition use to be smaller and relaxed, as a rebound without changing the hand configuration.

For Translation: The general criterion for segmentation (performed by a professional interpreter involved also in the recordings) was to adapt the text in OL (oral language) to resemble as closely as possible the signed LSE (Spanish Sign Language) and to segment complete phrases, or smaller particles if it results in more semantically coherent segments. Additionally, due to discursive and grammatical differences between OL and LSE, 8 specific types of annotations were defined and marked in brackets:

  • If the text in OL has a visual but not literal correspondence in LSE, it will be marked with the code "V" (visual) in brackets [V:original text].
  • When discursive markers are used in LSE that do not usually appear in the text, [MD:type-MD] is inserted in the corresponding text: [MD:one], [MD:two], [MD:three], etc. [MD:theme], [MD:alternative], [MD:section], [MD:next], etc.
  • When the phrase in LSE is affected by bimodal signing (restricted by the grammatical order in the text), it is marked as [B: and the text signed bimodally]. This phenomenon is very typical in Sign Language when the person is reading and translating in real-time, as in broadcasting.
  • When fingerspelling is used and/or the corresponding sign is given, it will be translated literally. For example, in tier 'Trad': "The disease called [DL:GALACTORREA], its sign is [GL:GALACTORREA]" is signed in LSE as (gloss-type notation): DISEASE NAME G-A-L-A-C-T-O-RR-E-A (fingerspelled), SIGN cl:fluid-chest (visually explained).
  • If a semantic confusion occurs when signing a specific sign, the substitution code is used [S: "Spanish text - GLOSS of the sign actually performed"]. For example, the text says "There are many species:" and in the signed video, the sign for "culinary spices" is used, therefore [S: species - SPICES] is inserted.
  • When a deictic is used that replaces part of the text because they have previously located what they are mentioning there, and it is necessary for the sentence to make sense: [VD:"text replaced by the deictic"].
  • When a sign is temporarily defined for that discourse context, [T: signed word] is used.

Acknowledgments

This dataset is a collaborative effort of the next research goups and entities:

Gratitude is extended to them for their contributions and support.

Search
Clear search
Close search
Google apps
Main menu