1 dataset found
  1. Dataset for "Capturing Formality in Speech Across Domains and Languages"

    • zenodo.org
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debasmita Bhattacharya; Jie Chi; Peter Bell; Julia Hirschberg; Debasmita Bhattacharya; Jie Chi; Peter Bell; Julia Hirschberg (2024). Dataset for "Capturing Formality in Speech Across Domains and Languages" [Dataset]. http://doi.org/10.5281/zenodo.13298510
    Explore at:
    Dataset updated
    Aug 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Debasmita Bhattacharya; Jie Chi; Peter Bell; Julia Hirschberg; Debasmita Bhattacharya; Jie Chi; Peter Bell; Julia Hirschberg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We share the data used in our paper "Capturing Formality in Speech Across Domains and Languages", previously hosted on Google Drive. Corpora available in this dataset release include:

    1. All-India Radio (Hindi) [all_india_radio-20240812T152206Z-001.zip]
    2. Bangor Miami (Spanish-English) [bangor_miami_clean-20240812T152204Z-001.zip]
    3. CallHome (English; Spanish) [callhome-20240812T152201Z-001.zip; callhome-20240812T152201Z-002.zip]
    4. CALLFriend (Hindi) [cf_hindi-20240812T152159Z-001.zip]
    5. HUB4-SE (Spanish) [hub4_se-20240812T152156Z-001.zip]
    6. HUB5 (Mandarin) [hub5_transcript-20240812T152154Z-001.zip]
    7. Multilingual TEDx (English; Spanish) [mtedx_es-en-20240812T152152Z-001.zip, mtedx_es-en-20240812T152152Z-002.zip, mtedx_es-en-20240812T152152Z-003.zip, mtedx_es-en-20240812T152152Z-004.zip, mtedx_es-en-20240812T152152Z-005.zip, mtedx_es-en-20240812T152152Z-006.zip, mtedx_es-en-20240812T152152Z-007.zip, mtedx_es-en-20240812T152152Z-008.zip, mtedx_es-en-20240812T152152Z-009.zip, mtedx_es-en-20240812T152152Z-010.zip, mtedx_es-en-20240812T152152Z-011.zip, mtedx_es-en-20240812T152152Z-012.zip, mtedx_es-en-20240812T152152Z-013.zip, mtedx_es-en-20240812T152152Z-014.zip, mtedx_es-en-20240812T152152Z-015.zip, mtedx_es-en-20240812T152152Z-016.zip, mtedx_es-en-20240812T152152Z-017.zip, mtedx_es-en-20240812T152152Z-018.zip, mtedx_es-en-20240812T152152Z-019.zip]
    8. Multitarget TED (English; Mandarin) [multitarget-ted-20240812T152149Z-001.zip]
    9. IIT-B (Hindi) [parallel-n-20240812T042826Z-001.zip]
    10. TDT4 (English; Mandarin) [tdt4_multilingual_news-20240812T152131Z-001.zip]
    11. TED Talks India (Hindi) [ted_talks_hindi-20240812T042346Z-001.zip]
    12. UN (Mandarin) [UNv1.0.en-zh-002.en.zip; UN-20240812T040956Z-002.zip; UN-20240812T040956Z-003.zip]
    13. YouTube (English; Spanish; Hindi; Mandarin) [youtube-20240812T040730Z-001.zip; youtube-20240812T040730Z-002.zip; youtube-20240812T040730Z-003.zip]
    14. All-CS (Hindi-English) [All-CS.json]
    15. Europarl v7 (Spanish) [europarl-v7.es-en.es]

    If using our YouTube and/or TED Talks India corpora, please cite our paper:

    Bhattacharya, D., Chi, J., Hirschberg, J., Bell, P. (2023) Capturing Formality in Speech Across Domains and Languages. Proc. INTERSPEECH 2023, 1030-1034, doi: 10.21437/Interspeech.2023-1852

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Debasmita Bhattacharya; Jie Chi; Peter Bell; Julia Hirschberg; Debasmita Bhattacharya; Jie Chi; Peter Bell; Julia Hirschberg (2024). Dataset for "Capturing Formality in Speech Across Domains and Languages" [Dataset]. http://doi.org/10.5281/zenodo.13298510
Organization logo

Dataset for "Capturing Formality in Speech Across Domains and Languages"

Explore at:
Dataset updated
Aug 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Debasmita Bhattacharya; Jie Chi; Peter Bell; Julia Hirschberg; Debasmita Bhattacharya; Jie Chi; Peter Bell; Julia Hirschberg
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We share the data used in our paper "Capturing Formality in Speech Across Domains and Languages", previously hosted on Google Drive. Corpora available in this dataset release include:

  1. All-India Radio (Hindi) [all_india_radio-20240812T152206Z-001.zip]
  2. Bangor Miami (Spanish-English) [bangor_miami_clean-20240812T152204Z-001.zip]
  3. CallHome (English; Spanish) [callhome-20240812T152201Z-001.zip; callhome-20240812T152201Z-002.zip]
  4. CALLFriend (Hindi) [cf_hindi-20240812T152159Z-001.zip]
  5. HUB4-SE (Spanish) [hub4_se-20240812T152156Z-001.zip]
  6. HUB5 (Mandarin) [hub5_transcript-20240812T152154Z-001.zip]
  7. Multilingual TEDx (English; Spanish) [mtedx_es-en-20240812T152152Z-001.zip, mtedx_es-en-20240812T152152Z-002.zip, mtedx_es-en-20240812T152152Z-003.zip, mtedx_es-en-20240812T152152Z-004.zip, mtedx_es-en-20240812T152152Z-005.zip, mtedx_es-en-20240812T152152Z-006.zip, mtedx_es-en-20240812T152152Z-007.zip, mtedx_es-en-20240812T152152Z-008.zip, mtedx_es-en-20240812T152152Z-009.zip, mtedx_es-en-20240812T152152Z-010.zip, mtedx_es-en-20240812T152152Z-011.zip, mtedx_es-en-20240812T152152Z-012.zip, mtedx_es-en-20240812T152152Z-013.zip, mtedx_es-en-20240812T152152Z-014.zip, mtedx_es-en-20240812T152152Z-015.zip, mtedx_es-en-20240812T152152Z-016.zip, mtedx_es-en-20240812T152152Z-017.zip, mtedx_es-en-20240812T152152Z-018.zip, mtedx_es-en-20240812T152152Z-019.zip]
  8. Multitarget TED (English; Mandarin) [multitarget-ted-20240812T152149Z-001.zip]
  9. IIT-B (Hindi) [parallel-n-20240812T042826Z-001.zip]
  10. TDT4 (English; Mandarin) [tdt4_multilingual_news-20240812T152131Z-001.zip]
  11. TED Talks India (Hindi) [ted_talks_hindi-20240812T042346Z-001.zip]
  12. UN (Mandarin) [UNv1.0.en-zh-002.en.zip; UN-20240812T040956Z-002.zip; UN-20240812T040956Z-003.zip]
  13. YouTube (English; Spanish; Hindi; Mandarin) [youtube-20240812T040730Z-001.zip; youtube-20240812T040730Z-002.zip; youtube-20240812T040730Z-003.zip]
  14. All-CS (Hindi-English) [All-CS.json]
  15. Europarl v7 (Spanish) [europarl-v7.es-en.es]

If using our YouTube and/or TED Talks India corpora, please cite our paper:

Bhattacharya, D., Chi, J., Hirschberg, J., Bell, P. (2023) Capturing Formality in Speech Across Domains and Languages. Proc. INTERSPEECH 2023, 1030-1034, doi: 10.21437/Interspeech.2023-1852

Search
Clear search
Close search
Google apps
Main menu