4 datasets found

h
JL-Corpus
huggingface.co
Updated May 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CD (2025). JL-Corpus [Dataset]. https://huggingface.co/datasets/CLAPv2/JL-Corpus
Explore at:
Dataset updated
May 8, 2025
Dataset authored and provided by
CD
Description
CLAPv2/JL-Corpus dataset hosted on Hugging Face and contributed by the HF Datasets community
JL corpus
kaggle.com
Updated Oct 29, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Li Tian
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
For further understanding the wide array of emotions embedded in human speech, we are introducing an emotional speech corpus. In contrast to the existing speech corpora, this corpus was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions along with 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI), where the aim is to model natural conversations among humans and robots. But there are very few existing speech resources to study these emotions,and this work adds a speech corpus containing some secondary emotions.

Please use the corpus for emotional speech related studies. When you use it please include the citation as:

Jesin James, Li Tian, Catherine Watson, "An Open Source Emotional Speech Corpus for Human Robot Interaction Applications", in Proc. Interspeech, 2018.

To access the whole corpus including the recording supporting files, click the following link: https://www.kaggle.com/tli725/jl-corpus, (if you have already installed the Kaggle API, you can type the following command to download: kaggle datasets download -d tli725/jl-corpus)

Or if you simply want the raw audio+txt files, click the following link: https://www.kaggle.com/tli725/jl-corpus/downloads/Raw%20JL%20corpus%20(unchecked%20and%20unannotated).zip/1

The corpus was evaluated by a large scale human perception test with 120 participants. The link to the survey are here- For Primary emorion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_8ewmOCgOFCHpAj3

For Secondary emotion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_eVDINp8WkKpsPsh

These surveys will give an overall idea about the type of recordings in the corpus.

The perceptually verified and annotated JL corpus will be given public access soon.
h
EMO
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lyu You (2025). EMO [Dataset]. https://huggingface.co/datasets/HelloBug1/EMO
Explore at:
Dataset updated
Jun 1, 2025
Authors
Lyu You
Description
Emotion Datasets

This dataset is a collection of several emotion datasets: JL Corpus, RAVDESS, eNTERFACE, MEAD, ESD, and CREMA-D. Example: { "speaker_id": "crema-d-speaker-1067", # Speaker ID "emotion": "angry", # Emotion label "emotion_intensity": "medium", # Emotion intensity "transcript": "It's eleven o'clock.", # Transcript "repetition": "null", # Repetition "language": "English", #… See the full description on the dataset page: https://huggingface.co/datasets/HelloBug1/EMO.
Odia News & Wiki
kaggle.com
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arnav Samal (2024). Odia News & Wiki [Dataset]. https://www.kaggle.com/datasets/arnavs19/odia-news-and-wiki
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2024
Dataset provided by
Kaggle
Authors
Arnav Samal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Odia Language Dataset

This dataset consists of text samples in the Odia language. It includes articles from various sources such as news websites, Wikipedia, and a news corpus.

Description

Odia News Articles: A collection of recent news articles written in the Odia language. These articles cover a wide range of topics including politics, sports, entertainment, and more.

19000 Odia News Articles which have been cleaned

Source: Github Repository

Odia Wikipedia Articles: Text samples extracted from articles on the Odia Wikipedia. These articles cover a variety of topics such as history, culture, science, and geography.

17k Odia Wiki Articles which have been cleaned

Source: Github Repository

Odia News Corpus: A corpus of text data gathered from various news sources in Odisha. This corpus includes both formal news articles as well as informal blog posts and opinion pieces.

550,000 Odia News Articles which have been cleaned

Source: Blog Post

Use Cases

Language Modeling: The dataset can be used to train language models for the Odia language, enabling tasks such as text generation, summarization, and translation.

Sentiment Analysis: Analyzing the sentiment of news articles and other text samples can provide insights into public opinion and reactions to events in Odisha.

Topic Modeling: Identifying and categorizing topics within the dataset can help in understanding the most prevalent themes in Odia language content.

Format

dataset/ ├── odia-news-classification/ │ ├── train.csv │ └── valid.csv ├── odia-news-corpus/ │ ├── dharitri_dataset.jl │ ├── pragativadi_dataset.jl │ ├── prameya_dataset.jl │ ├── samaja_dataset.jl │ ├── samaya_dataset.jl │ └── sambad_dataset.jl └── odia-wiki-articles/ ├── train/ │ └── train/ │ ├── article1.txt │ ├── article2.txt │ └── ... └── valid/ └── valid/ ├── article1.txt ├── article2.txt └── ...

License

Please refer to the respective sources for more information on permitted uses.

Citation

If you use this dataset in your research or applications, please consider citing the original sources to acknowledge the contributors and support future work in the field of Odia language processing.
Not seeing a result you expected?
Learn how you can add new datasets to our index.