4 datasets found
  1. h

    JL-Corpus

    • huggingface.co
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CD (2025). JL-Corpus [Dataset]. https://huggingface.co/datasets/CLAPv2/JL-Corpus
    Explore at:
    Dataset updated
    May 8, 2025
    Dataset authored and provided by
    CD
    Description

    CLAPv2/JL-Corpus dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. JL corpus

    • kaggle.com
    Updated Oct 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Li Tian
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    For further understanding the wide array of emotions embedded in human speech, we are introducing an emotional speech corpus. In contrast to the existing speech corpora, this corpus was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions along with 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI), where the aim is to model natural conversations among humans and robots. But there are very few existing speech resources to study these emotions,and this work adds a speech corpus containing some secondary emotions.

    Please use the corpus for emotional speech related studies. When you use it please include the citation as:

    Jesin James, Li Tian, Catherine Watson, "An Open Source Emotional Speech Corpus for Human Robot Interaction Applications", in Proc. Interspeech, 2018.

    To access the whole corpus including the recording supporting files, click the following link: https://www.kaggle.com/tli725/jl-corpus, (if you have already installed the Kaggle API, you can type the following command to download: kaggle datasets download -d tli725/jl-corpus)

    Or if you simply want the raw audio+txt files, click the following link: https://www.kaggle.com/tli725/jl-corpus/downloads/Raw%20JL%20corpus%20(unchecked%20and%20unannotated).zip/1

    The corpus was evaluated by a large scale human perception test with 120 participants. The link to the survey are here- For Primary emorion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_8ewmOCgOFCHpAj3

    For Secondary emotion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_eVDINp8WkKpsPsh

    These surveys will give an overall idea about the type of recordings in the corpus.

    The perceptually verified and annotated JL corpus will be given public access soon.

  3. h

    EMO

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lyu You (2025). EMO [Dataset]. https://huggingface.co/datasets/HelloBug1/EMO
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Lyu You
    Description

    Emotion Datasets

    This dataset is a collection of several emotion datasets: JL Corpus, RAVDESS, eNTERFACE, MEAD, ESD, and CREMA-D. Example: { "speaker_id": "crema-d-speaker-1067", # Speaker ID "emotion": "angry", # Emotion label "emotion_intensity": "medium", # Emotion intensity "transcript": "It's eleven o'clock.", # Transcript "repetition": "null", # Repetition "language": "English", #… See the full description on the dataset page: https://huggingface.co/datasets/HelloBug1/EMO.

  4. Odia News & Wiki

    • kaggle.com
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arnav Samal (2024). Odia News & Wiki [Dataset]. https://www.kaggle.com/datasets/arnavs19/odia-news-and-wiki
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2024
    Dataset provided by
    Kaggle
    Authors
    Arnav Samal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Odia Language Dataset

    This dataset consists of text samples in the Odia language. It includes articles from various sources such as news websites, Wikipedia, and a news corpus.

    Description

    1. Odia News Articles: A collection of recent news articles written in the Odia language. These articles cover a wide range of topics including politics, sports, entertainment, and more.
    2. 19000 Odia News Articles which have been cleaned
    3. Source: Github Repository
    4. Odia Wikipedia Articles: Text samples extracted from articles on the Odia Wikipedia. These articles cover a variety of topics such as history, culture, science, and geography.
    5. 17k Odia Wiki Articles which have been cleaned
    6. Source: Github Repository
    7. Odia News Corpus: A corpus of text data gathered from various news sources in Odisha. This corpus includes both formal news articles as well as informal blog posts and opinion pieces.
    8. 550,000 Odia News Articles which have been cleaned
    9. Source: Blog Post

    Use Cases

    • Language Modeling: The dataset can be used to train language models for the Odia language, enabling tasks such as text generation, summarization, and translation.
    • Sentiment Analysis: Analyzing the sentiment of news articles and other text samples can provide insights into public opinion and reactions to events in Odisha.
    • Topic Modeling: Identifying and categorizing topics within the dataset can help in understanding the most prevalent themes in Odia language content.

    Format

      dataset/
      ├── odia-news-classification/
      │  ├── train.csv
      │  └── valid.csv
      ├── odia-news-corpus/
      │  ├── dharitri_dataset.jl
      │  ├── pragativadi_dataset.jl
      │  ├── prameya_dataset.jl
      │  ├── samaja_dataset.jl
      │  ├── samaya_dataset.jl
      │  └── sambad_dataset.jl
      └── odia-wiki-articles/
        ├── train/
        │  └── train/
        │    ├── article1.txt
        │    ├── article2.txt
        │    └── ...
        └── valid/
          └── valid/
            ├── article1.txt
            ├── article2.txt
            └── ...
    

    License

    Please refer to the respective sources for more information on permitted uses.

    Citation

    If you use this dataset in your research or applications, please consider citing the original sources to acknowledge the contributors and support future work in the field of Odia language processing.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CD (2025). JL-Corpus [Dataset]. https://huggingface.co/datasets/CLAPv2/JL-Corpus

JL-Corpus

CLAPv2/JL-Corpus

Explore at:
214 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 8, 2025
Dataset authored and provided by
CD
Description

CLAPv2/JL-Corpus dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu