25 datasets found
  1. Spanish Spontaneous Dialogue speech dataset

    • kaggle.com
    zip
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Wong (2024). Spanish Spontaneous Dialogue speech dataset [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/spanish-spontaneous-dialogue-speech-dataset
    Explore at:
    zip(93236 bytes)Available download formats
    Dataset updated
    Jun 7, 2024
    Authors
    Frank Wong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Spanish(Spain) Spontaneous Dialogue Telephony speech dataset

    Description

    Spanish(Spain) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(600 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1234?source=Kaggle

    Format

    8kHz 8bit, a-law/u-law pcm, mono channel

    Content category

    Dialogue based on given topics

    Recording condition

    Low background noise (indoor)

    Recording device

    Telephony

    Country

    Spain(ESP)

    Language(Region) Code

    es-ES

    Language

    Spanish

    Speaker

    600 people in total, 49% male and 51% female

    Features of annotation

    Transcription text, timestamp, speaker ID, gender

    Accuracy rate

    Word accuracy rate(WAR) 98%

    Licensing Information

    Commercial License

  2. F

    In-Car Speech Dataset: Spanish (Mexico)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). In-Car Speech Dataset: Spanish (Mexico) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/in-car-speech-dataset-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Mexican Spanish Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.

    Speech Data

    This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.

    Participant Diversity:

    - Speakers: 50+ native Spanish speakers from the FutureBeeAI Community.

    - Regions: Ensures a balanced representation of Mexico1 accents, dialects, and demographics.

    - Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Recording Nature: Scripted wake word and command type of audio recordings.

    - Duration: Average duration of 5 to 20 seconds per audio recording.

    - Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.

    Dataset Diversity

    Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.

    Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.

    Different Cars: Data collection was carried out in different types and models of cars.

    Different Types of Voice Commands:

    - Navigational Voice Commands

    - Mobile Control Voice Commands

    - Car Control Voice Commands

    - Multimedia & Entertainment Commands

    - General, Question Answer, Search Commands

    Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.

    - Morning

    - Afternoon

    - Evening

    Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:

    - Noise Level: Silent, Low Noise, Moderate Noise, High Noise

    - Parking Location: Indoor, Outdoor

    - Car Windows: Open, Closed

    - Car AC: On, Off

    - Car Engine: On, Off

    - Car Movement: Stationary, Moving

    Metadata

    The dataset provides comprehensive metadata for each audio recording and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.

    Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, file format, recording time.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Spanish voice assistant speech recognition models.

    License

    This Mexican Spanish In-car audio dataset is created by FutureBeeAI and is available for commercial use.

  3. Spanish Language Datasets | 1.8M+ Sentences | Translation Data | TTS |...

    • datarade.ai
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). Spanish Language Datasets | 1.8M+ Sentences | Translation Data | TTS | Dictionary Display | Translations | EU & LATAM Coverage [Dataset]. https://datarade.ai/data-products/spanish-language-datasets-1-8m-sentences-nlp-tts-dic-oxford-languages
    Explore at:
    .json, .xml, .csv, .xls, .txt, .mp3, .wavAvailable download formats
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Oxford Languageshttps://lexico.com/es
    Area covered
    Bolivia (Plurinational State of), Chile, Colombia, Nicaragua, Ecuador, Paraguay, Panama, Costa Rica, Cuba, Honduras
    Description

    Linguistically annotated Spanish language datasets with headwords, definitions, senses, examples, POS tags, semantic metadata, and usage info. Ideal for dictionary tools, NLP, and TTS model training or fine-tuning.

    Our Spanish language datasets are carefully compiled and annotated by language and linguistic experts; you can find them available for licensing:

    1. Spanish Monolingual Dictionary Data
    2. Spanish Bilingual Dictionary Data
    3. Spanish Sentences Data
    4. Synonyms and Antonyms Data
    5. Audio Data
    6. Spanish Word List Data

    Key Features (approximate numbers):

    1. Spanish Monolingual Dictionary Data

    Our Spanish monolingual reliably offers clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Spanish language.

    • Words: 73,000
    • Senses: 123,000
    • Example sentences: 104,000
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually
    1. Spanish Bilingual Dictionary Data

    The bilingual data provides translations in both directions, from English to Spanish and from Spanish to English. It is annually reviewed and updated by our in-house team of language experts. Offers significant coverage of the language, providing a large volume of translated words of excellent quality.

    • Translations: 221,300
    • Senses: 103,500
    • Example sentences: 74,500
    • Example translations: 83,800
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually
    1. Spanish Sentences Data

    Spanish sentences retrieved from the corpus are ideal for NLP model training, presenting approximately 20 million words. The sentences provide a great coverage of Spanish-speaking countries and are accordingly tagged to a particular country or dialect.

    • Sentences volume: 1,840,000
    • Format: XML and JSON format
    • Delivery: Email (link-based file sharing) and REST API
    1. Spanish Synonyms and Antonyms Data

    This Spanish language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for building linguistically aware AI systems and language technologies.

    • Synonyms: 127,700
    • Antonyms: 9,500
    • Format: XML format
    • Delivery: Email (link-based file sharing)
    • Updated frequency: annually
    1. Spanish Audio Data (word-level)

    Curated word-level audio data for the Spanish language, which covers all varieties of world Spanish, providing rich dialectal diversity in the Spanish language.

    • Audio files: 20,900
    • Format: XLSX (for index), MP3 and WAV (audio files)
    1. Spanish Word List Data

    This language data contains a carefully curated and comprehensive list of 450,000 Spanish words.

    • Wordforms: 450,000
    • Format: CSV and TXT formats
    • Delivery: Email (link-based file sharing)

    Use Cases:

    We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD).

    If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Oxford.Languages@oup.com to start the conversation.

    Pricing:

    Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

    Contact our team or email us at Oxford.Languages@oup.com to explore pricing options and discover how our language data can support your goals.

    About the sample:

    The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only.

    If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information

  4. Spanish Spontaneous Dialogue Telephony speech

    • kaggle.com
    zip
    Updated Jun 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Wong (2024). Spanish Spontaneous Dialogue Telephony speech [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/spanish-spontaneous-dialogue-telephony-speech/code
    Explore at:
    zip(215338 bytes)Available download formats
    Dataset updated
    Jun 11, 2024
    Authors
    Frank Wong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    88-Hours-Mexican-Spanish-Conversational-Speech-Data-by-Telephone

    Description

    Spanish(Mexico) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics. Transcribed with text content, timestamp, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers(122 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied. For more details, please refer to the link:https://www.nexdata.ai/datasets/speechrecog/1352?source=Kaggle

    Format

    8kHz 8bit, a-law/u-law pcm, mono channel

    Content category

    Dialogue based on given topics

    Recording condition

    Low background noise (indoor)

    Recording device

    Telephony

    Country

    Mexico(MEX)

    Language(Region) Code

    es-MX

    Language

    Spanish

    Speaker

    122 people in total, 53% male and 47% female

    Features of annotation

    Transcription text, timestamp, speaker ID, gender, noise

    Accuracy rate

    Word accuracy rate(WAR) 98%

    Licensing Information

    Commercial License

  5. F

    In-Car Speech Dataset: Bulgarian (Bulgaria)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). In-Car Speech Dataset: Bulgarian (Bulgaria) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/in-car-speech-dataset-bulgarian-bulgaria
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Bulgaria
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US Spanish Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.

    Speech Data

    This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.

    Participant Diversity:

    - Speakers: 50+ native Spanish speakers from the FutureBeeAI Community.

    - Regions: Ensures a balanced representation of USA1 accents, dialects, and demographics.

    - Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Recording Nature: Scripted wake word and command type of audio recordings.

    - Duration: Average duration of 5 to 20 seconds per audio recording.

    - Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.

    Dataset Diversity

    Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.

    Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.

    Different Cars: Data collection was carried out in different types and models of cars.

    Different Types of Voice Commands:

    - Navigational Voice Commands

    - Mobile Control Voice Commands

    - Car Control Voice Commands

    - Multimedia & Entertainment Commands

    - General, Question Answer, Search Commands

    Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.

    - Morning

    - Afternoon

    - Evening

    Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:

    - Noise Level: Silent, Low Noise, Moderate Noise, High Noise

    - Parking Location: Indoor, Outdoor

    - Car Windows: Open, Closed

    - Car AC: On, Off

    - Car Engine: On, Off

    - Car Movement: Stationary, Moving

    Metadata

    The dataset provides comprehensive metadata for each audio recording and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.

    Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, file format, recording time.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Spanish voice assistant speech recognition models.

    License

    This US Spanish In-car audio dataset is created by FutureBeeAI and is available for commercial use.

  6. F

    In-Car Speech Dataset: Spanish (Argentina)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). In-Car Speech Dataset: Spanish (Argentina) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/in-car-speech-dataset-spanish-argentina
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Argentina
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Argentinians Spanish Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.

    Speech Data

    This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.

    Participant Diversity:

    - Speakers: 50+ native Spanish speakers from the FutureBeeAI Community.

    - Regions: Ensures a balanced representation of Argentina1 accents, dialects, and demographics.

    - Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Recording Nature: Scripted wake word and command type of audio recordings.

    - Duration: Average duration of 5 to 20 seconds per audio recording.

    - Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.

    Dataset Diversity

    Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.

    Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.

    Different Cars: Data collection was carried out in different types and models of cars.

    Different Types of Voice Commands:

    - Navigational Voice Commands

    - Mobile Control Voice Commands

    - Car Control Voice Commands

    - Multimedia & Entertainment Commands

    - General, Question Answer, Search Commands

    Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.

    - Morning

    - Afternoon

    - Evening

    Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:

    - Noise Level: Silent, Low Noise, Moderate Noise, High Noise

    - Parking Location: Indoor, Outdoor

    - Car Windows: Open, Closed

    - Car AC: On, Off

    - Car Engine: On, Off

    - Car Movement: Stationary, Moving

    Metadata

    The dataset provides comprehensive metadata for each audio recording and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.

    Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, file format, recording time.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Spanish voice assistant speech recognition models.

    License

    This Argentinians Spanish In-car audio dataset is created by FutureBeeAI and is available for commercial use.

  7. Spanish Housing Dataset: Location, Size, Price,

    • kaggle.com
    zip
    Updated Nov 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Spanish Housing Dataset: Location, Size, Price, [Dataset]. https://www.kaggle.com/datasets/thedevastator/spanish-housing-dataset-location-size-price-and/code
    Explore at:
    zip(45386344 bytes)Available download formats
    Dataset updated
    Nov 26, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Spanish Housing Dataset: Location, Size, Price, and More!

    Now with 100% More Fun!

    By [source]

    About this dataset

    Looking for a place to live in Spain? This dataset contains information about houses in various Spanish provinces that will help you with your search! The data includes information about the houses such as location, size, price, amenities, and more. With this dataset, you can study the housing market in Spain, compare prices and styles of houses across different provinces, or learn more about the features of houses in different parts of the country. So whether you're looking for your dream home or just curious about Spanish real estate, this dataset is a great place to start!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    The Spanish Housing Dataset contains information about houses in various Spanish provinces. The data includes information about the houses such as location, size, price, amenities, and so on. This dataset can be used to study the housing market in Spain, to compare prices and styles of houses in different provinces, or to find out more about the features of houses in different parts of

    Research Ideas

    • To study the housing market in Spain and compare prices and styles of houses in different provinces
    • To find out more about the features of houses in different parts of the country
    • To compare prices and styles of houses in different parts of the province

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: addinfo.csv | Column name | Description | |:--------------|:-----------------------------------------------------------------| | poblacion | The population of the city where the house is located. (Numeric) | | source | The source of the data. (Categorical) |

    File: links.csv | Column name | Description | |:-------------------|:------------------------------------------------------| | link | The URL of the listing. (String) | | num_link | The listing's unique identifier. (String) | | obtention_date | The date on which the listing was collected. (String) |

    File: rentas_PV.csv

    File: rentas_espanya.csv | Column name | Description | |:----------------------------|:------------------------------------------| | Número de declaraciones | The number of tax declarations. (Numeric) |

    File: zones.csv | Column name | Description | |:--------------|:-------------------------------------| | type | The type of the house. (Categorical) |

    File: houses_alava.csv | Column name | Description | |:----------------------|:------------------------------------------------------------------------------------| | obtention_date | The date on which the listing was collected. (String) | | ad_description | A description of the house. (String) | | ad_last_update | The date of the last update to the listing. (String) | | air_conditioner | A indicator of whether or not the house has air conditioning. (Boolean) | | balcony | A indicator of whether or not the house has a balcony. (Boolean) | | bath_num | The number of bathrooms in the house. (Integer) | | built_in_wardrobe | A indicator of whether or not the house has a built in wardrobe. (Boolean) | | chimney | A indicator of whether or not the house has a chimney. (Boolean) | | construct_date | The date the house was constructed. (String) | | energetic_certif | The energetic certification of the house. (String) | | **fl...

  8. f

    Table_1_Parental Burnout Assessment (PBA) in Different Hispanic Countries:...

    • figshare.com
    • frontiersin.figshare.com
    docx
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Denisse Manrique-Millones; Georgy M. Vasin; Sergio Dominguez-Lara; Rosa Millones-Rivalles; Ricardo T. Ricci; Milagros Abregu Rey; María Josefina Escobar; Daniela Oyarce; Pablo Pérez-Díaz; María Pía Santelices; Claudia Pineda-Marín; Javier Tapia; Mariana Artavia; Maday Valdés Pacheco; María Isabel Miranda; Raquel Sánchez Rodríguez; Clara Isabel Morgades-Bamba; Ainize Peña-Sarrionandia; Fernando Salinas-Quiroz; Paola Silva Cabrera; Moïra Mikolajczak; Isabelle Roskam (2023). Table_1_Parental Burnout Assessment (PBA) in Different Hispanic Countries: An Exploratory Structural Equation Modeling Approach.DOCX [Dataset]. http://doi.org/10.3389/fpsyg.2022.827014.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    Frontiers
    Authors
    Denisse Manrique-Millones; Georgy M. Vasin; Sergio Dominguez-Lara; Rosa Millones-Rivalles; Ricardo T. Ricci; Milagros Abregu Rey; María Josefina Escobar; Daniela Oyarce; Pablo Pérez-Díaz; María Pía Santelices; Claudia Pineda-Marín; Javier Tapia; Mariana Artavia; Maday Valdés Pacheco; María Isabel Miranda; Raquel Sánchez Rodríguez; Clara Isabel Morgades-Bamba; Ainize Peña-Sarrionandia; Fernando Salinas-Quiroz; Paola Silva Cabrera; Moïra Mikolajczak; Isabelle Roskam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parental burnout is a unique and context-specific syndrome resulting from a chronic imbalance of risks over resources in the parenting domain. The current research aims to evaluate the psychometric properties of the Spanish version of the Parental Burnout Assessment (PBA) across Spanish-speaking countries with two consecutive studies. In Study 1, we analyzed the data through a bifactor model within an Exploratory Structural Equation Modeling (ESEM) on the pooled sample of participants (N = 1,979) obtaining good fit indices. We then attained measurement invariance across both gender and countries in a set of nested models with gradually increasing parameter constraints. Latent means comparisons across countries showed that among the participants’ countries, Chile had the highest parental burnout score, likewise, comparisons across gender evidenced that mothers displayed higher scores than fathers, as shown in previous studies. Reliability coefficients were high. In Study 2 (N = 1,171), we tested the relations between parental burnout and three specific consequences, i.e., escape and suicidal ideations, parental neglect, and parental violence toward one’s children. The medium to large associations found provided support for the PBA’s predictive validity. Overall, we concluded that the Spanish version of the PBA has good psychometric properties. The results support its relevance for the assessment of parental burnout among Spanish-speaking parents, offering new opportunities for cross-cultural research in the parenting domain.

  9. 16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM)...

    • data.nexdata.ai
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 16kHz Conversational Speech Data | 35,000 Hours | Large Language Model(LLM) Data | Speech AI Datasets |Multilingual Language Data [Dataset]. https://data.nexdata.ai/products/nexdata-multilingual-conversational-speech-data-16khz-mob-nexdata
    Explore at:
    Dataset updated
    Aug 3, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Brazil, Syrian Arab Republic, Hong Kong, Ukraine, Pakistan, Egypt, Malaysia, Italy, Switzerland, Bulgaria
    Description

    Nexdata has off-the-shelf 35,000 hours Multilingual Language Data of 16kHz conversational speech, covering 100+ countries including English, German, French, Spanish, Italian, Portuguese, Korean, Japanese, Hindi, Russia and etc.

  10. Z

    COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel...

    • data.niaid.nih.gov
    Updated Jan 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassine Drias; Habiba Drias (2021). COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel CoronaVirus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4024176
    Explore at:
    Dataset updated
    Jan 23, 2021
    Dataset provided by
    LRIA - USTHB
    LRIA - University of Algiers
    Authors
    Yassine Drias; Habiba Drias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 653 996 tweets related to the Coronavirus topic and highlighted by hashtags such as: #COVID-19, #COVID19, #COVID, #Coronavirus, #NCoV and #Corona. The tweets' crawling period started on the 27th of February and ended on the 25th of March 2020, which is spread over four weeks.

    The tweets were generated by 390 458 users from 133 different countries and were written in 61 languages. English being the most used language with almost 400k tweets, followed by Spanish with around 80k tweets.

    The data is stored in as a CSV file, where each line represents a tweet. The CSV file provides information on the following fields:

    Author: the user who posted the tweet

    Recipient: contains the name of the user in case of a reply, otherwise it would have the same value as the previous field

    Tweet: the full content of the tweet

    Hashtags: the list of hashtags present in the tweet

    Language: the language of the tweet

    Relationship: gives information on the type of the tweet, whether it is a retweet, a reply, a tweet with a mention, etc.

    Location: the country of the author of the tweet, which is unfortunately not always available

    Date: the publication date of the tweet

    Source: the device or platform used to send the tweet

    The dataset can as well be used to construct a social graph since it includes the relations "Replies to", "Retweet", "MentionsInRetweet" and "Mentions".

  11. d

    Global English Speech with Accent Conversational Dataset — Multi-Region...

    • datarade.ai
    .wav
    Updated Jul 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2025). Global English Speech with Accent Conversational Dataset — Multi-Region Validated Speech with Gender, Age & Metadata for AI & NLP Training [Dataset]. https://datarade.ai/data-products/global-english-speech-with-accent-conversational-dataset-mu-filemarket
    Explore at:
    .wavAvailable download formats
    Dataset updated
    Jul 21, 2025
    Dataset authored and provided by
    FileMarket
    Area covered
    Tonga, Montenegro, United States Minor Outlying Islands, Nicaragua, Haiti, Iceland, Cook Islands, Comoros, Bangladesh, Yemen
    Description

    The Global English Accent Conversational NLP Dataset is a comprehensive collection of validated English speech recordings sourced from native and non-native English speakers across key global regions. This dataset is designed for training Natural Language Processing models, conversational AI, Automatic Speech Recognition (ASR), and linguistic research, with a focus on regional accent variation.

    Regions and Covered Countries with Primary Spoken Languages:

    Africa: South Africa (English, Zulu, Afrikaans, Xhosa) Nigeria (English, Yoruba, Igbo, Hausa) Kenya (English, Swahili) Ghana (English, Twi, Ewe, Ga) Uganda (English, Luganda) Ethiopia (English, Amharic, Oromo)

    Central & South America: Mexico (Spanish, English as a second language) Guatemala (Spanish, K'iche', English) El Salvador (Spanish, English) Costa Rica (Spanish, English in Caribbean regions) Colombia (Spanish, English in urban centers) Dominican Republic (Spanish, English in tourist zones) Brazil (Portuguese, English in urban areas) Argentina (Spanish, English among educated speakers)

    Southeast Asia & South Asia: Philippines (Filipino, English) Vietnam (Vietnamese, English) Malaysia (Malay, English, Mandarin) Indonesia (Indonesian, Javanese, English) Singapore (English, Mandarin, Malay, Tamil) India (Hindi, English, Bengali, Tamil) Pakistan (Urdu, English, Punjabi)

    Europe: United Kingdom (English) Ireland (English, Irish) Germany (German, English) France (French, English) Spain (Spanish, Catalan, English) Italy (Italian, English) Portugal (Portuguese, English)

    Oceania: Australia (English) New Zealand (English, Māori) Fiji (English, Fijian) North America: United States (English, Spanish) Canada (English, French)

    Dataset Attributes: - Conversational English with natural accent variation - Global coverage with balanced male/female speakers - Rich speaker metadata: age, gender, country, city - Average audio length of ~30 minutes per participant - All samples manually validated for accuracy - Structured format suitable for machine learning and AI applications

    Best suited for: - NLP model training and evaluation - Multilingual ASR system development - Voice assistant and chatbot design - Accent recognition research - Voice synthesis and TTS modeling

    This dataset ensures global linguistic diversity and delivers high-quality audio for AI developers, researchers, and enterprises working on voice-based applications.

  12. 👨‍👩‍👧 US Country Demographics

    • kaggle.com
    zip
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 👨‍👩‍👧 US Country Demographics [Dataset]. https://www.kaggle.com/datasets/mexwell/us-country-demographics
    Explore at:
    zip(343499 bytes)Available download formats
    Dataset updated
    Aug 14, 2023
    Authors
    mexwell
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Area covered
    United States
    Description

    The following data set is information obtained about counties in the United States from 2010 through 2019 through the United States Census Bureau. Information described in the data includes the age distributions, the education levels, employment statistics, ethnicity percents, houseold information, income, and other miscellneous statistics. (Values are denoted as -1, if the data is not available)

    Data Dictionary

    <...

    KeyList of...CommentExample Value
    CountyStringCounty name"Abbeville County"
    StateStringState name"SC"
    Age.Percent 65 and OlderFloatEstimated percentage of population whose ages are equal or greater than 65 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).22.4
    Age.Percent Under 18 YearsFloatEstimated percentage of population whose ages are under 18 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).19.8
    Age.Percent Under 5 YearsFloatEstimated percentage of population whose ages are under 5 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).4.7
    Education.Bachelor's Degree or HigherFloatPercentage for the people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 2019.15.6
    Education.High School or HigherFloatPercentage of people whose highest degree was a high school diploma or its equivalent people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 201981.7
    Employment.Nonemployer EstablishmentsIntegerAn establishment is a single physical location at which business is conducted or where services or industrial operations are performed. It is not necessarily identical with a company or enterprise which may consist of one establishment or more. The data was collected from 2018.1416
    Ethnicities.American Indian and Alaska Native AloneFloatEstimated percentage of population having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment. This category includes people who indicate their race as "American Indian or Alaska Native" or report entries such as Navajo Blackfeet Inupiat Yup'ik or Central American Indian groups or South American Indian groups.0.3
    Ethnicities.Asian AloneFloatEstimated percentage of population having origins in any of the original peoples of the Far East Southeast Asia or the Indian subcontinent including for example Cambodia China India Japan Korea Malaysia Pakistan the Philippine Islands Thailand and Vietnam. This includes people who reported detailed Asian responses such as: "Asian Indian " "Chinese " "Filipino " "Korean " "Japanese " "Vietnamese " and "Other Asian" or provide other detailed Asian responses.0.4
    Ethnicities.Black AloneFloatEstimated percentage of population having origins in any of the Black racial groups of Africa. It includes people who indicate their race as "Black or African American " or report entries such as African American Kenyan Nigerian or Haitian.27.6
    Ethnicities.Hispanic or LatinoFloat
  13. LATAM Data Suite | 1.8M+ Sentences | Natural Language Processing (NLP) Data...

    • datarade.ai
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). LATAM Data Suite | 1.8M+ Sentences | Natural Language Processing (NLP) Data | TTS | Dictionary Display | Translation Data | LATAM Coverage [Dataset]. https://datarade.ai/data-products/latam-data-suite-1-8m-sentences-nlp-tts-dictionary-d-oxford-languages
    Explore at:
    .json, .xml, .csv, .xls, .mp3, .wavAvailable download formats
    Dataset updated
    Jul 22, 2025
    Dataset authored and provided by
    Oxford Languageshttps://lexico.com/es
    Area covered
    Panama, Dominican Republic, Bolivia (Plurinational State of), Colombia, Puerto Rico, Mexico, Ecuador, Uruguay, Spain, Peru
    Description

    LATAM Data Suite provides high-quality datasets in Spanish, Portuguese, and American English. Ideal for NLP, AI, LLMs, translation, and education, it combines linguistic depth and regional authenticity to power scalable, multilingual language technologies.

    Discover our expertly curated language datasets in the LATAM Data Suite. Compiled and annotated by language and linguistic experts, this suite offers high-quality resources tailored to your needs. This suite includes:

    • Monolingual and Bilingual Dictionary Data Featuring headwords, definitions, word senses, part-of-speech (POS) tags, and semantic metadata.

    • Sentences Curated examples of real-world usage with contextual annotations.

    • Synonyms & Antonyms Lexical relations to support semantic search, paraphrasing, and language understanding.

    • Audio Data Native speaker recordings for TTS and pronunciation modeling.

    • Word Lists Frequency-ranked and thematically grouped lists.

    Learn more about the datasets included in the data suite:

    1. Portuguese Monolingual Dictionary Data
    2. Portuguese Bilingual Dictionary Data
    3. Spanish Monolingual Dictionary Data
    4. Spanish Bilingual Dictionary Data
    5. Spanish Sentences Data
    6. Spanish Synonyms and Antonyms Data
    7. Spanish Audio Data
    8. Spanish Word List Data
    9. American English Monolingual Dictionary Data
    10. American English Synonyms and Antonyms Data
    11. American English Pronunciations with Audio

    Key Features (approximate numbers):

    1. Portuguese Monolingual Dictionary Data

    Our Portuguese monolingual covers both European and Latin American varieties, featuring clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Portuguese language.

    • Words: 143,600
    • Senses: 285,500
    • Example sentences: 69,300
    • Format: XML format
    • Delivery: Email (link-based file sharing)
    1. Portuguese Bilingual Dictionary Data

    The bilingual data provides translations in both directions, from English to Portuguese and from Portuguese to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality that span both European and Latin American Portuguese varieties.

    • Translations: 300,000
    • Senses: 158,000
    • Example translations: 117,800
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually
    1. Spanish Monolingual Dictionary Data

    Our Spanish monolingual reliably offers clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Spanish language.

    • Words: 73,000
    • Senses: 123,000
    • Example sentences: 104,000
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually
    1. Spanish Bilingual Dictionary Data

    The bilingual data provides translations in both directions, from English to Spanish and from Spanish to English. It is annually reviewed and updated by our in-house team of language experts. Offers significant coverage of the language, providing a large volume of translated words of excellent quality.

    • Translations: 221,300
    • Senses: 103,500
    • Example sentences: 74,500
    • Example translations: 83,800
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually
    1. Spanish Sentences Data

    Spanish sentences retrieved from corpus are ideal for NLP model training, presenting approximately 20 million words. The sentences provide a great coverage of Spanish-speaking countries and are accordingly tagged to a particular country or dialect.

    • Sentences volume: 1,840,000
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    1. Spanish Synonyms and Antonyms Data

    This Spanish language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for building linguistically aware AI systems and language technologies.

    • Synonyms: 127,700
    • Antonyms: 9,500
    • Format: XML format
    • Delivery: Email (link-based file sharing)
    • Updated frequency: annually
    1. Spanish Audio Data (word-level)

    Curated word-level audio data for the Spanish language, which covers all varieties of world Spanish, providing rich dialectal diversity in the Spanish language.

    • Audio files: 20,900
    • Format: XLSX (for index), MP3 and WAV (audio files)
    1. Spanish Word List Data

    This language data contains a carefully curated and comprehensive list of 450,000 Spanish words.

    • Wordforms: 450,000
    • Format: CSV and TXT formats
    • Delivery: Email (link-based file sharing)
    1. American English Monolingual Dictionary Data

    Our American English Monolingual Dictionary Data is the foremost au...

  14. f

    Data_Sheet_1_Spanish Version of the Teachers’ Sense of Efficacy Scale: An...

    • frontiersin.figshare.com
    • figshare.com
    docx
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fátima Salas-Rodríguez; Sonia Lara; Martín Martínez (2023). Data_Sheet_1_Spanish Version of the Teachers’ Sense of Efficacy Scale: An Adaptation and Validation Study.docx [Dataset]. http://doi.org/10.3389/fpsyg.2021.714145.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Fátima Salas-Rodríguez; Sonia Lara; Martín Martínez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Teachers’ Sense of Efficacy Scale (TSES) has been the most widely used instrument to assess teacher efficacy beliefs. However, no study has been carried out concerning the TSES psychometric properties with teachers in Mexico, the country with the highest number of Spanish-speakers worldwide. The purpose of the present study is to examine the reliability, internal and external validity evidence of the TSES (short form) adapted into Spanish with a sample of 190 primary and secondary Mexican teachers from 25 private schools. Results of construct analysis confirm the three-factor-correlated structure of the original scale. Criterion validity evidence was established between self-efficacy and job satisfaction. Differences in self-efficacy were related to teachers’ gender, years of experience and grade level taught. Some limitations are discussed, and future research directions are recommended.

  15. Spanish conversation smart phone

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Appen Limited (2025). Spanish conversation smart phone [Dataset]. https://www.kaggle.com/datasets/appenlimited/spanish-conversation-smart-phone/code
    Explore at:
    zip(285111724 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    Appen Limited
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    如需完整数据集或了解更多,请发邮件至commercialproduct@appen.com For the complete dataset or more, please email commercialproduct@appen.com

    *** THIS IS A SAMPLE DATABASE ONLY. THE INFORMATION CONTAINED IN *** THE REST OF THIS README APPLIES TO THE FULL DATABASE

    This is a Spanish (EU) conversational database, produced by Appen Butler Hill Pty. Ltd. in 2021.

    Appen Butler Hill Pty. Ltd. owns copyright of the database.

    The database contains transcription and speech data recorded during 207 sessions. Each of the 207 unique speaker-pairs was recorded making conversations of approximately an average of 60 minutes.

    Each pair of speakers recorded 4 to 12 conversations of approximately an average of 5-15 min on different topics. Speakers were provided with a topic for each conversation.

    The recordings have been made using the Appen Mobile smartphone app with the phone positioned between two speakers who are in the same room.

    The database contains approximately 223.48 hours of audio data in total.

    The directory structure is designed to group data from each pair of speakers in a single folder. Each pair of speakers has been identified with a unique ID (Users_ID). This ID is also the name of each session folder. Within each session folder are the conversations made by the pair of speakers. The file naming structure indicates the language and country, date of recording, session ID (which is different from the speaker pair ID), and the conversational topic.

    Directory Structure:

    /+-COPYRIGHT.TXT +-README.TXT +-DEMOGRAPHICS.CSV | | +-AUDIO----------+-1027536----------------+-SPAESP_20210123-224609-0006_Topic_Clothing.WAV | | +-SPAESP_20210123-224609-0007_Topic_Insurance.WAV | | . | | . | | +-SPAESP_20210123-224609-0017_Topic_Social.WAV | | | | | +-1089511----------------+-SPAESP_20210220-140351-0001_Topic_Information.WAV | | +-SPAESP_20210220-140351-0002_Topic_Insurance.WAV | | . | | . | | +-SPAESP_20210220-140351-0013_Topic_Media.WAV | . | . | . | +-980733-----------------+-SPAESP_20210313-013101-0002_Topic_Travel.WAV | +-SPAESP_20210313-013101-0003_Topic_Insurance.WAV | . | . | +-SPAESP_20210313-013101-0011_Topic_Health.WAV | | |
    +-TRANSCRIPTION--+-1027536----------------+-SPAESP_20210123-224609-0006_Topic_Clothing.TXT | | +-SPAESP_20210123-224609-0007_Topic_Insurance.TXT | | . | | . | | +-SPAESP_20210123-224609-0017_Topic_Social.TXT | | | | | +-1089511----------------+-SPAESP_20210220-140351-0001_Topic_Information.TXT | | +-SPAESP_20210220-140351-0002_Topic_Insurance.TXT | | . | | . | | +-SPAESP_20210220-140351-0013_Topic_Media.TXT | . | . | . | +-980733-----------------+-SPAESP_20210313-013101-0002_Topic_Travel.TXT | +-SPAESP_20210313-013101-0003_Topic_Insurance.TXT | . | . | +-SPAESP_20210313-013101-0011_Topic_Health.TXT |

    COPYRIGHT.TXT is a copyright document in ASCII format.

    README.TXT is this file. It is an ASCII text file that describes the database.

    DEMOGRAPHICS.CSV is an Excel file that contains the following fields: - Users_ID - Device_Model - Device_OS - Participant_1_Gender - Participant_1_Age - Participant_1_Dialect - Participant_2_Gender - Participant_2_Age - Participant_2_Dialect - Topics - Environment

    CONVERSATION TOPICS: The conversations were spontaneous and were on a variety of generic topics (e.g. news, travel, study etc.). The topics can be found in the Demographics file.

    Participants were provided with 12 topics to choose from. They needed to pick at least 4 topics and may skip up to 8 topics.

    /AUDIO contains all audio data from the 207 sessions. Audio format is WAVE audio, Microsoft PCM, 16 bit, mono 48000 Hz

    /TRANSCRIPTION contains all transcription data for the 207 sessions.

    The Audio and Transcription filenames use the following template:

  16. f

    Data from: Does educational level predict hearing aid self-efficacy in...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Dec 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fuente, Adrian; Fuentes-López, Eduardo; Luna-Monsalve, Manuel; Valdivia, Gonzalo (2019). Does educational level predict hearing aid self-efficacy in experienced older adult hearing aid users from Latin America? Validation process of the Spanish version of the MARS-HA questionnaire [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000131807
    Explore at:
    Dataset updated
    Dec 19, 2019
    Authors
    Fuente, Adrian; Fuentes-López, Eduardo; Luna-Monsalve, Manuel; Valdivia, Gonzalo
    Area covered
    Latin America
    Description

    Hearing aids are the most common rehabilitation strategy for age-related hearing loss. However, 25% to 50% of older adults fitted with hearing aids do not wear them post-fitting. Hearing aid self-efficacy has been suggested as one of the key factors that may explain adherence to hearing aids in older adults. The primary aim of this study was to determine a possible association between educational level and hearing aid self-efficacy in older adult hearing aid users from a Latin American country (i.e., Chile). The secondary aim was to determine if in this sample of older adults, hearing aid self-efficacy predicted hearing aid adherence as previously suggested by other studies. The MARS-HA (Measure of Audiologic Rehabilitation Self-Efficacy for Hearing Aids) questionnaire was used to measure hearing aid self-efficacy. This questionnaire was initially adapted into Spanish (S-MARS-HA) using forward and backward translations by bilingual English-Spanish speakers. A sample of 252 older adults fitted with hearing aids at a public hospital in Santiago, Chile, was investigated. Educational level was measured as the number of years of formal education. Participants responded to the S-MARS-HA along with questions exploring social support, attitudes in using hearing aids, participation in social events, and vision and joint problems. Hearing aid adherence was investigated with the use of a question from the International Outcome Inventory for Hearing Aids. All these procedures were conducted at the participants’ homes. Pure-tone average (PTA; 500–4000 Hz) in the fitted ear was obtained from the participants’ medical records. Univariate and multivariate regression models were constructed to investigate the association between educational level and hearing aid self-efficacy controlling for the covariates of interest (e.g., social support, attitudes in using hearing aids, PTA). The S-MARS-HA showed an adequate construct validity along with a good reliability. Results of the multivariate regression analyses showed that educational level significantly predicted hearing aid self-efficacy. Covariates significantly associated with this outcome included attitudes in using hearing aids and PTA in the fitted ear. Finally, a significant association between hearing aid self-efficacy and adherence to hearing aid use was observed. In conclusion, this study showed a significant association between educational level and hearing aid self-efficacy in older adults from a developing Latin American country. Thus, this variable should be considered when designing and delivering aural rehabilitation programs such as hearing aids to older adults, especially those from developing countries.

  17. e

    Data from: DIHANA corpus

    • ekoizpen-zientifikoa.ehu.eus
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benedí, José Miguel; Lleida, Eduardo; Varona, Amparo; Benedí, José Miguel; Lleida, Eduardo; Varona, Amparo (2021). DIHANA corpus [Dataset]. https://ekoizpen-zientifikoa.ehu.eus/documentos/668fc45bb9e7c03b01bdaf01
    Explore at:
    Dataset updated
    2021
    Authors
    Benedí, José Miguel; Lleida, Eduardo; Varona, Amparo; Benedí, José Miguel; Lleida, Eduardo; Varona, Amparo
    Description

    DIHANA is composed of 900 human-computer dialogues in Spanish. The acquisition of the DIHANA corpus was carried out by means of an initial prototype using the Wizard of Oz technique. The operator simulates speech recognition and understanding errors and the answers being synthesized according to a predefined set of templates. This acquisition was only restricted at the semantical level (i.e., the acquired dialogues are related to a specific task domain) and was not restricted at the lexical and syntactical level (spontaneous speech). In the acquisition process, this semantic control was provided by the definition of scenarios that the user must accomplish and by the wizard strategy, which defines the behavior of the acquisition system. The DIHANA task consists of the retrieval of information about Spanish nationwide trains by telephone. Several types of scenarios were defined in order to control the interaction of the user with the system. A scenario is defined by: an objective, the information needed by the user; a situation, the specific circumstances related to the trip request; and the specific requirements of the trip, type of trip, departure city, destination city, and one or more restrictions. The DIHANA corpus contains 5.5 hours of spontaneous speech corresponding to 6278 sentences. In total 225 speakers (153 males and 72 females) recorded 900 dialogues, resulting in 6,278 user turns. Along with the dialogues (speech signals), their full transcript is also provided and a lexicon phonetically containing all the words pronounced in the database. In addition a semantic tagging of the corpus and a labeling of the same corpus in terms of dialog acts is also provided. A more detailed description of DIHANA can be found in the "doc" subfolder and in the following papers: - N. Alcacer, J.M. Benedí, F. Blat, R. Granell, D. Martínez-Hinarejos and F. Torres: "Acquisition and Labelling of a Spontaneous Speech Dialogue Corpus". In proceedings of SPECOM, pages 583-586. Patras (Greece), October 2005. - J.M. Benedí E.Lleida, A. Varona, M.J.Castro, I.Galiano, R.Justo, I. López, and A. Miguel: "Design and acquisition of a telephone spontaneous speech dialogue corpus in spanish: DIHANA". In proceedings of LREC, pages 1636-1639, Genova, Italy, May 2006. Next we describe the contents of each subfolder: README: This file. data: The database: 75 speakers of 3 sites (Basque Country,
    Aragon and Valencian Country) for 4 scenarios making a
    total of 225 speakers (153 males and 72 females) with
    900 dialogues. For each dialog is provided: - the speech signal of each user turn (.ul)
    - the intermediate (
    .dis) and final (.xml) transcriptions
    - the dialogue Act annotation + Dialogue Act annotation on transcription (
    .dia)
    + Dialogue Act annotation on categorised transcription,
    without words for each category (.cad)
    + Dialogue Act annotation on categorised transcription,
    with words for each category (
    .cwd) semdata: Semantic tagging of the full corpora, in the "data" subfolder,
    and documentation describing the process of semátinco labeling,
    in the "doc" folder. doc: Various documents (PDF) related to the design and acquisition
    processes, the annotation format and event statistics. guides: 5 lists of 45 speakers, which account for 5 leaving-one-out
    partitions. One of the lists can be alternatively chosen as
    the test set, the other four joined to form the training set.
    Under the folder corresponding to each speaker, the speech
    signals and transcriptions corresponding to four dialogues
    can be found, so each partition consists of 720 training
    dialogues an 180 test dialogues. software: Various self-commented programs and utilities.

  18. Hispanic People - Liveness Detection Video Dataset

    • kaggle.com
    zip
    Updated Apr 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2024). Hispanic People - Liveness Detection Video Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/hispanic-people-liveness-detection-video-dataset/code
    Explore at:
    zip(216247226 bytes)Available download formats
    Dataset updated
    Apr 19, 2024
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Biometric Attack Dataset, Hispanic People

    The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset

    The dataset for face anti spoofing and face recognition includes images and videos of hispanic people. 32,600+ photos & video of 16,300 people from 20 countries. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group.

    The videos were gathered by capturing faces of genuine individuals presenting spoofs, using facial presentations. Our dataset proposes a novel approach that learns and detects spoofing techniques, extracting features from the genuine facial images to prevent the capturing of such information by fake users.

    The dataset contains images and videos of real humans with various resolutions, views, and colors, making it a comprehensive resource for researchers working on anti-spoofing technologies.

    People in the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F0af40cfdcb1e53ab635c56d179135f58%2FFrame%20107.png?generation=1713530575878904&alt=media" alt="">

    Types of files in the dataset:

    • photo - selfie of the person
    • video - real video of the person

    Our dataset also explores the use of neural architectures, such as deep neural networks, to facilitate the identification of distinguishing patterns and textures in different regions of the face, increasing the accuracy and generalizability of the anti-spoofing models.

    👉 Legally sourced datasets and carefully structured for AI training and model development. Explore samples from our dataset of 95,000+ human images & videos - Full dataset

    Metadata for the full dataset:

    • assignment_id - unique identifier of the media file
    • worker_id - unique identifier of the person
    • age - age of the person
    • true_gender - gender of the person
    • country - country of the person
    • video_extension - video extensions in the dataset
    • video_resolution - video resolution in the dataset
    • video_duration - video duration in the dataset
    • video_fps - frames per second for video in the dataset
    • photo_extension - photo extensions in the dataset
    • photo_resolution - photo resolution in the dataset

    Statistics for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fef36e7e993c83b572df52283e13b1736%2Fhispanic_video.png?generation=1713530604439187&alt=media" alt="">

    🧩 This is just an example of the data. Leave a request here to learn more

    Content

    The dataset consists of: - files - includes 10 folders corresponding to each person and including 1 image and 1 video, - .csv file - contains information about the files and people in the dataset

    File with the extension .csv

    • id: id of the person,
    • selfie_link: link to access the photo,
    • video_link: link to access the video,
    • age: age of the person,
    • country: country of the person,
    • gender: gender of the person,
    • video_extension: video extension,
    • video_resolution: video resolution,
    • video_duration: video duration,
    • video_fps: frames per second for video,
    • photo_extension: photo extension,
    • photo_resolution: photo resolution

    🚀 You can learn more about our high-quality unique datasets here

    keywords: liveness detection systems, liveness detection dataset, biometric dataset, biometric data dataset, biometric system attacks, anti-spoofing dataset, face liveness detection, deep learning dataset, face spoofing database, face anti-spoofing, ibeta dataset, face anti spoofing, large-scale face anti spoofing, rich annotations anti spoofing dataset, hispanic people, hispanic classification, hispanic image dataset

  19. Database on ICD-11 MBND course modes for Spanish-speaking clinicians

    • figshare.com
    xlsx
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebeca Robles (2024). Database on ICD-11 MBND course modes for Spanish-speaking clinicians [Dataset]. http://doi.org/10.6084/m9.figshare.27704220.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 13, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rebeca Robles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Clinical Descriptions and Diagnostic Guidelines (CDDG) for the eleventh version of the WHO´s International Classification of Diseases (ICD-11), and Mental, Behavioral and Neurodevelopmental Disorders (MBND) constitute a substantial improvement over ICD-10 MBND CDDG. As part of the efforts to implement ICD-11 MBND CDDG in Spanish-speaking countries through continuing education for health professionals, this study was designed to evaluate the usefulness of a comprehensive online training course and its modalities (synchronous and asynchronous) to increase both the knowledge of and readiness to use this novel, evidence-based diagnostic tool. METHOD: A sample of Spanish-speaking psychiatrists, psychologists and general practitioners completed pre- and/or post-evaluations of one of the two modalities of ICD-11 MBND CDDG (asynchronous or synchronous). Knowledge of the material was evaluated at the end of the course through an ad hoc multiple-choice questionnaire, and readiness to implement ICD-11 MBND CDDG was evaluated before and after the course using an instrument based on the transtheoretical model developed by Prochaska and Diclemente, consisting of a linear scheme with five stages of change: precontemplation, contemplation, preparation, action, and maintenance. RESULTS: More women than men, younger health professionals and more clinicians from Mexico than any other country participated in the synchronous than in the asynchronous course. Prior to the course, most participants were at the pre-contemplation stage of readiness to implement the ICD-11 MBND CDDG. By the end of the course, participants reported a moderate level of knowledge of the ICD-11 MBND CDDG (with those in the synchronous course reporting higher levels of knowledge than those in the asynchronous one), while the percentage of clinicians at the preparation and action stages was higher than before the courses (with no differences being observed by course modes). CONCLUSIONS: Online training proved useful for achieving a moderate level of knowledge of the ICD-11 MBND CDDG and a substantial increase in clinicians’ readiness to implement them as part of their regular professional practice. Whichever course mode is preferred and feasible is recommended for interested Spanish-speaking clinicians.

  20. Budget Share of Food for Spanish Households

    • kaggle.com
    Updated Jul 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Utkarsh Singh (2023). Budget Share of Food for Spanish Households [Dataset]. https://www.kaggle.com/datasets/utkarshx27/budget-share-of-food-for-spanish-households
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2023
    Dataset provided by
    Kaggle
    Authors
    Utkarsh Singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    number of observations : 23972
    observation : households
    country : Spain
    
    ColumnDescription
    wfoodpercentage of total expenditure which the household has spent on food
    totexptotal expenditure of the household
    ageage of reference person in the household
    sizesize of the household
    townsize of the town where the household is placed categorized into 5 groups: 1 for small towns, 5 for big ones
    sexsex of reference person (man,woman)

    References Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Frank Wong (2024). Spanish Spontaneous Dialogue speech dataset [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/spanish-spontaneous-dialogue-speech-dataset
Organization logo

Spanish Spontaneous Dialogue speech dataset

Explore at:
zip(93236 bytes)Available download formats
Dataset updated
Jun 7, 2024
Authors
Frank Wong
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Spanish(Spain) Spontaneous Dialogue Telephony speech dataset

Description

Spanish(Spain) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(600 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1234?source=Kaggle

Format

8kHz 8bit, a-law/u-law pcm, mono channel

Content category

Dialogue based on given topics

Recording condition

Low background noise (indoor)

Recording device

Telephony

Country

Spain(ESP)

Language(Region) Code

es-ES

Language

Spanish

Speaker

600 people in total, 49% male and 51% female

Features of annotation

Transcription text, timestamp, speaker ID, gender

Accuracy rate

Word accuracy rate(WAR) 98%

Licensing Information

Commercial License

Search
Clear search
Close search
Google apps
Main menu