https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
🏥 About This OTS Dataset
This Off-The-Shelf (OTS) dataset features a robust collection of audio recordings capturing real-world conversations between US agents and US patients in English across various medical scenarios. Designed to empower speech recognition systems and conversational AI tools, this dataset is ideal for training models to understand and replicate natural language used in medical interactions.
📊 Metadata Availability: Participant Insights
Each audio… See the full description on the dataset page: https://huggingface.co/datasets/Macgence/usa-agent-to-usa-patient-medical-conversation-speech-dataset-in-english.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
About This OTS Dataset
This Off-The-Shelf (OTS) dataset offers a comprehensive collection of audio recordings showcasing conversations between US customers within automobile call centers. It is meticulously curated to enhance speech recognition and conversational AI models tailored specifically to the unique dynamics of interactions within US customer call centers in the automobile industry.
Metadata Availability: Insights into Participant Details
Each participant is… See the full description on the dataset page: https://huggingface.co/datasets/Macgence/us-customer-to-us-customer-speech-dataset-in-english-for-automobiles.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
About This OTS Dataset
This Off-The-Shelf (OTS) dataset offers a comprehensive collection of audio recordings showcasing conversations between Chichewa-speaking customers within the general sector. It is meticulously curated to enhance speech recognition and conversational AI models tailored to the unique dynamics of customer interactions in various industries.
Metadata Availability: Insights into Participant Details
Each participant is accompanied by detailed… See the full description on the dataset page: https://huggingface.co/datasets/Macgence/chichewa-customer-speech-dataset.
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Access English medical conversation data between USA agents and patients. Perfect for AI development, speech analysis, and healthcare research.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Turkish Speech Corpus (TSC) Dataset Description The Turkish Speech Corpus (TSC) is an open-source dataset introduced in the paper "Multilingual Speech Recognition for Turkic Languages". It contains 218.2 hours of transcribed Turkish speech, comprising 186,171 utterances. At the time of its release, TSC was the largest publicly available Turkish speech dataset of its kind. This dataset is designed to support research in automatic speech recognition (ASR), particularly for Turkic languages.
Key Statistics Total Duration: 218.2 hours Number of Utterances: 186,171 Language: Turkish License: Open-source (specific license not specified; please check the repository for details) Source Paper: Multilingual Speech Recognition for Turkic Languages GitHub Repository: TurkicASR - https://github.com/IS2AI/TurkicASR Hosted on: Hugging Face
Citation If you use this dataset in your research, please cite the following paper:
@Article{info14020074,
AUTHOR = {Mussakhojayeva, Saida and Dauletbek, Kaisar and Yeshpanov, Rustem and Varol, Huseyin Atakan},
TITLE = {Multilingual Speech Recognition for Turkic Languages},
JOURNAL = {Information},
VOLUME = {14},
YEAR = {2023},
NUMBER = {2},
ARTICLE-NUMBER = {74},
URL = {https://www.mdpi.com/2078-2489/14/2/74},
ISSN = {2078-2489}
}
28 natural, unscripted conversations between adult native U.S. English speakers, each around 5 minutes long with high-fidelity audio recorded in-studio.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Voice And Speech Recognition Market size was valued at USD 23.70 billion in 2023 and is projected to reach USD 61.52 billion by 2032, exhibiting a CAGR of 14.6 % during the forecasts period. The Voice and Speech Recognition Market includes the technology aiming at allowing devices recognize and interpret human speech. This technology transcribes spoken language to written text known as speech recognition and also confirms or even recognizes individuals by their voice also known as voice recognition. All types of industries like consumer electronics, automotive, healthcare, banking, telecommunication, etc. incorporate them. Siri, Alexa and Google Assistant are some of the consumer-based application while in healthcare, voice recognition helps in documentation without handling pen or paper. Major trends include developments in artificial intelligence and, machine learning implementing better precision and better contexts, synchronizing with the IoT gadgets, and increasing utility in the security systems for the verification. Therefore, the market is growing because the focus is being shifted more and more to contactless interaction and user experience. Recent developments include: In March 2023, Google AI introduced a new update to its Universal Speech Model (USM) in support of the 1,000 Languages Initiative. A universal speech model is a machine learning algorithm designed to comprehend and interpret spoken language across diverse languages and accents. The USM, a family of advanced speech models with 2 billion parameters, has been trained on an extensive dataset of 12 million hours of speech and 28 billion sentences in over 300 languages. Google claims that the USM excels in automatic speech recognition (ASR) for languages with limited resources, such as Assamese, Cebuano, Amharic, and Azerbaijani, as well as widely spoken languages like Mandarin and English , In May 2023, Apple unveiled a suite of cutting-edge cognitive accessibility features, including Live Speech, Personal Voice, and Point and Speak in Magnifier, designed to elevate usability and accessibility for individuals with disabilities. By collaborating closely with disability community groups, Apple reinforces its dedication to ensuring technology remains inclusive, making a tangible difference in the lives of its users , In October 2022, iFLYTEK Corp. introduced the Speech Translation Technology Platform for Southeast Asian Languages at the ASEAN Summit in October 2022. To access online administrative and public services, Guangxi residents can use the mobile app "Zhiguitong," which features the iFLYTEK speech translation technology for ASEAN languages. Chinese Mandarin to Lao, English, Vietnamese, Indonesian, Thai, Tamil, Malay, and Burmese instant speech translation is available. .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains two new benchmark corpora designed for low-resource languages spoken in the Democratic Republic of the Congo: The Lingala Read Speech Corpus LRSC, with 4.3 hours of labelled audio, and the Congolese Speech Radio Corpus CSRC, which offers 741 hours of unlabeled audio spanning four significant low-resource languages of the region (Lingala, Tshiluba, Kikongo and Congolese Swahili). Collecting speech and audio for this dataset involved two sets of processes: (1) for LRSC, 32 Congolese adult participants were instructed to sit in a relaxed manner within centimetres of an audio recording device or smartphone and read from the text utterances; (2) for CSRC, recording from the archives of a broadcast station were pre-processed and curated. Congolese languages tend to fall into the “low-resource” category, which, in contrast to “high-resource” languages, has fewer datasets accessible, limiting the development of Conversational Artificial Intelligence. This results in creating the speech recognition datasets for low-resource Congolese languages. The proposed dataset contains two sections. The first section involves training a supervised speech recognition module, while the second involves pre-training a self-supervised model. Both sections feature a wide variety of speech and audio taken in various environments, with the first section featuring a speech having its corresponding transcription and the second featuring a collection of pre-processed raw audio data.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
USA English Call Center Medical Speech Dataset
📘 About This OTS Dataset
This meticulously curated Off-The-Shelf (OTS) dataset offers a comprehensive repository of audio recordings showcasing conversations between USA speakers within the medical sector. It is designed to enhance automatic speech recognition (ASR) systems and conversational AI models, focusing on the intricate nature of doctor-patient consultations.
📊 Metadata Availability: Insights into… See the full description on the dataset page: https://huggingface.co/datasets/Macgence/usa-english-call-center-medical-speech-dataset-for-medical.
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Yearly citation counts for the publication titled "Robust Log-Energy Estimation and its Dynamic Change Enhancement for In-car Speech Recognition".
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Quantum-Enhanced Speech Synthesis market size reached USD 1.38 billion in 2024, reflecting a robust trajectory fueled by significant technological advancements and growing enterprise adoption. The market is expanding at a compelling CAGR of 32.8% and is forecasted to reach USD 15.34 billion by 2033. This remarkable growth is primarily attributed to the integration of quantum computing capabilities with advanced speech synthesis algorithms, which is revolutionizing the quality, speed, and contextual accuracy of generated speech across numerous industries.
A key driver for the Quantum-Enhanced Speech Synthesis market is the exponential improvement in computational power enabled by quantum technologies. Traditional speech synthesis models, while effective, are often limited by classical hardware constraints, particularly when processing large-scale, complex linguistic datasets or generating highly naturalistic speech patterns. Quantum computing, with its ability to process vast amounts of data simultaneously and tackle complex optimization problems, is transforming the landscape of speech synthesis. This has led to the development of quantum machine learning and quantum natural language processing models that deliver superior contextual understanding, faster response times, and more human-like speech outputs, making them highly attractive for next-generation virtual assistants, customer service bots, and accessibility tools.
Another significant growth factor is the rising demand for personalized and accessible digital experiences. As digital interactions become increasingly voice-driven, sectors such as healthcare, education, and customer service are seeking advanced speech synthesis solutions to improve user engagement, accessibility, and inclusivity. Quantum-enhanced speech synthesis is uniquely positioned to address these needs by enabling high-precision, real-time voice generation that can adapt to individual user preferences, accents, and languages. This has accelerated adoption among enterprises aiming to differentiate their offerings and comply with global accessibility standards, further propelling market expansion.
The surge in investments from both public and private sectors into quantum computing research and its application in artificial intelligence is also catalyzing market growth. Governments and major technology companies are allocating substantial resources to develop scalable quantum hardware and software platforms, with a particular focus on AI-driven applications such as speech synthesis. These investments are fostering innovation, reducing costs, and facilitating the commercialization of quantum-enhanced speech solutions. As a result, the ecosystem is witnessing a proliferation of startups and established vendors collaborating to push the boundaries of what is possible in speech technology.
Regionally, North America currently leads the Quantum-Enhanced Speech Synthesis market, driven by a strong innovation ecosystem, significant R&D investments, and early adoption across industries such as healthcare, BFSI, and media & entertainment. Europe and Asia Pacific are also witnessing rapid growth, fueled by increasing digital transformation initiatives and government-backed quantum technology programs. Meanwhile, emerging markets in Latin America and the Middle East & Africa are gradually recognizing the potential of quantum-enhanced speech synthesis, particularly for improving accessibility and customer engagement in multilingual and diverse populations.
The Quantum-Enhanced Speech Synthesis market is segmented by technology into Quantum Machine Learning, Quantum Natural Language Processing, Quantum Acoustic Modeling, and Others. Quantum Machine Learning (QML) is at the forefront, leveraging quantum algorithms to train and deploy speech models that significantly outperform classical counterparts in terms of speed and accuracy. QML enables the processing of massive speech datasets, allowing for the extraction of nuanced linguistic patterns and the generation of more natural-sounding speech. This technology is particularly valuable for applications requiring real-time response and high contextual awareness, such as interactive voice assistants and automated customer service platforms.
Quantum Natural Language Processing (QNLP)
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: To evaluate whether a quantitative speech measure is effective in identifying and monitoring motor speech impairment (MSI) in patients with primary progressive aphasia (PPA), and to investigate the neuroanatomical basis of MSI in PPA. Methods: Sixty-four patients with PPA were evaluated at baseline, with a subset (N=39) evaluated longitudinally. Articulation rate (AR), a quantitative measure derived from spontaneous speech, was measured at each timepoint. MRI was collected at baseline. Differences in baseline AR were assessed across PPA subtypes, separated by severity level. Linear mixed-effects models were conducted to assess groups differences across PPA subtypes in rate of decline in AR over a one-year period. Cortical thickness measured from baseline MRIs was used to test hypotheses about the relationship between cortical atrophy and MSI. Results: Baseline AR was reduced for patients with non-fluent variant PPA (nfvPPA), as compared to other PPA subtypes and controls, even in mild stages of disease. Longitudinal results showed a greater rate of decline in AR for the nfvPPA group over one year, as compared to logopenic and semantic variant subgroups. Reduced baseline AR was associated with cortical atrophy in left-hemisphere premotor and supplementary motor cortices. Conclusions: The AR measure is an effective quantitative index of MSI that detects MSI in mild disease stages and tracks decline in MSI longitudinally. The AR measure additionally demonstrates anatomic localization to motor-speech specific cortical regions. Our findings suggest that this quantitative measure of MSI might have utility in diagnostic evaluation and monitoring of motor speech impairments in PPA.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size of Speech & Voice Recognition Devices is $XX million in 2018 with XX CAGR from 2014 to 2018, and it is expected to reach $XX million by the end of 2024 with a CAGR of XX% from 2019 to 2024.
Global Speech & Voice Recognition Devices Market Report 2019 - Market Size, Share, Price, Trend and Forecast is a professional and in-depth study on the current state of the global Speech & Voice Recognition Devices industry. The key insights of the report:
1.The report provides key statistics on the market status of the Speech & Voice Recognition Devices manufacturers and is a valuable source of guidance and direction for companies and individuals interested in the industry.
2.The report provides a basic overview of the industry including its definition, applications and manufacturing technology.
3.The report presents the company profile, product specifications, capacity, production value, and 2013-2018 market shares for key vendors.
4.The total market is further divided by company, by country, and by application/type for the competitive landscape analysis.
5.The report estimates 2019-2024 market development trends of Speech & Voice Recognition Devices industry.
6.Analysis of upstream raw materials, downstream demand, and current market dynamics is also carried out
7.The report makes some important proposals for a new project of Speech & Voice Recognition Devices Industry before evaluating its feasibility.
There are 4 key segments covered in this report: competitor segment, product type segment, end use/application segment and geography segment.
For competitor segment, the report includes global key players of Speech & Voice Recognition Devices as well as some small players.
The information for each competitor includes:
* Company Profile
* Main Business Information
* SWOT Analysis
* Sales, Revenue, Price and Gross Margin
* Market Share
For product type segment, this report listed main product type of Speech & Voice Recognition Devices market
* Product Type I
* Product Type II
* Product Type III
For end use/application segment, this report focuses on the status and outlook for key applications. End users sre also listed.
* Application I
* Application II
* Application III
For geography segment, regional supply, application-wise and type-wise demand, major players, price is presented from 2013 to 2023. This report covers following regions:
* North America
* South America
* Asia & Pacific
* Europe
* MEA (Middle East and Africa)
The key countries in each region are taken into consideration as well, such as United States, China, Japan, India, Korea, ASEAN, Germany, France, UK, Italy, Spain, CIS, and Brazil etc.
Reasons to Purchase this Report:
* Analyzing the outlook of the market with the recent trends and SWOT analysis
* Market dynamics scenario, along with growth opportunities of the market in the years to come
* Market segmentation analysis including qualitative and quantitative research incorporating the impact of economic and non-economic aspects
* Regional and country level analysis integrating the demand and supply forces that are influencing the growth of the market.
* Market value (USD Million) and volume (Units Million) data for each segment and sub-segment
* Competitive landscape involving the market share of major players, along with the new projects and strategies adopted by players in the past five years
* Comprehensive company profiles covering the product offerings, key financial information, recent developments, SWOT analysis, and strategies employed by the major market players
* 1-year analyst support, along with the data support in excel format.
We also can offer customized report to fulfill special requirements of our clients. Regional and Countries report can be provided as well.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size for Cerebral Palsy Speech Therapy was valued at approximately USD 1.2 billion in 2023 and is projected to reach USD 2.5 billion by 2032, growing at a CAGR of 7.5%. This significant growth is driven by advancements in speech therapy techniques and increasing awareness about cerebral palsy and its associated speech disorders.
The primary growth factor for the Cerebral Palsy Speech Therapy market is the increasing incidence of cerebral palsy worldwide. According to studies, around 2-3 live births per 1,000 are affected by cerebral palsy, leading to a higher demand for specialized speech therapy. Innovations in medical technology and improved diagnostic methods are also contributing to the market's growth. Early diagnosis has become more prevalent, allowing for timely intervention and better management of speech and communication issues associated with cerebral palsy.
Another significant growth driver is the rising awareness and acceptance of speech therapy for cerebral palsy patients. Educational campaigns and advocacy by various non-profit organizations are helping to reduce the stigma associated with speech disorders and cerebral palsy. Governments in developed and developing countries are increasingly focusing on improving healthcare infrastructure, which includes facilities for speech therapy, thus providing a robust platform for market expansion.
Additionally, the growing adoption of advanced technology in speech therapy, such as Augmentative and Alternative Communication (AAC) devices, is revolutionizing the treatment landscape. These technologies offer improved communication options for patients, significantly enhancing their quality of life. The integration of artificial intelligence and machine learning in therapy tools is also opening new avenues for personalized and effective treatment options.
From a regional perspective, North America holds a significant share of the market due to its advanced healthcare infrastructure and high awareness levels among the population. Europe follows closely, with countries like Germany, France, and the UK leading in terms of healthcare services for cerebral palsy. Meanwhile, emerging economies in the Asia Pacific region, such as China and India, are expected to witness substantial growth due to increasing healthcare investments and growing awareness.
The Cerebral Palsy Speech Therapy market is segmented by therapy type into Articulation Therapy, Language Intervention Therapy, Oral-Motor/Feeding and Swallowing Therapy, and Augmentative and Alternative Communication (AAC). Each of these therapy types addresses specific needs of cerebral palsy patients, contributing to the overall effectiveness of the treatment.
Articulation therapy focuses on correcting speech sound production issues, which are common in cerebral palsy patients. This type of therapy helps individuals improve their pronunciation and clarity of speech, making communication more effective. The increasing occurrence of articulation disorders among cerebral palsy patients is driving the demand for this therapy type. Moreover, advancements in therapeutic techniques and tools are enhancing the efficiency of articulation therapy, thereby fueling market growth.
Language intervention therapy aims at improving the overall language skills of cerebral palsy patients. This therapy type is crucial for children who experience delays in language development due to cerebral palsy. The therapy involves various exercises that enhance a child's ability to understand and use language effectively. The growing emphasis on early intervention and the availability of specialized language programs are significant factors contributing to the growth of this segment.
Oral-Motor/Feeding and Swallowing Therapy is designed to address issues related to feeding and swallowing, which are prevalent in many cerebral palsy patients. This type of therapy helps improve the muscle strength and coordination required for eating and speaking. The rising awareness about the importance of addressing feeding and swallowing issues early on is driving the demand for this therapy type. Additionally, advancements in therapeutic methods and tools are making this therapy more effective and accessible.
Augmentative and Alternative Communication (AAC) encompasses various communication methods used to supplement or replace speech in individuals with severe speech and language impairments. AAC
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundOne factor which influences the speech intelligibility of cochlear implant (CI) users is the number and the extent of the functionality of spiral ganglion neurons (SGNs), referred to as “cochlear health.” To explain the interindividual variability in speech perception of CI users, a clinically applicable estimate of cochlear health could be insightful. The change in the slope of the electrically evoked compound action potentials (eCAP), amplitude growth function (AGF) as a response to increased interphase gap (IPG) (IPGEslope) has been introduced as a potential measure of cochlear health. Although this measure has been widely used in research, its relationship to other parameters requires further investigation.MethodsThis study investigated the relationship between IPGEslope, demographics and speech intelligibility by (1) considering the relative importance of each frequency band to speech perception, and (2) investigating the effect of the stimulus polarity of the stimulating pulse. The eCAPs were measured in three different conditions: (1) Forward masking with anodic-leading (FMA) pulse, (2) Forward masking with cathodic-leading (FMC) pulse, and (3) with alternating polarity (AP). This allowed the investigation of the effect of polarity on the diagnosis of cochlear health. For an accurate investigation of the correlation between IPGEslope and speech intelligibility, a weighting function was applied to the measured IPGEslopes on each electrode in the array to consider the relative importance of each frequency band for speech perception. A weighted Pearson correlation analysis was also applied to compensate for the effect of missing data by giving higher weights to the ears with more successful IPGEslope measurements.ResultsA significant correlation was observed between IPGEslope and speech perception in both quiet and noise for between-subject data especially when the relative importance of frequency bands was considered. A strong and significant correlation was also observed between IPGEslope and age when stimulation was performed with cathodic-leading pulses but not for the anodic-leading pulse condition.ConclusionBased on the outcome of this study it can be concluded that IPGEslope has potential as a relevant clinical measure indicative of cochlear health and its relationship to speech intelligibility. The polarity of the stimulating pulse could influence the diagnostic potential of IPGEslope.
The corpora of this collection show the different regulatory, co regulatory and self-regulatory initiatives to impose restrictions on disinformation about the pandemic as well as the criticism they have received particularly regarding the negative impacts on the freedom of speech. The time period is from the March 2020 to July 2021. Τhe outbreak of the pandemic in 2020 was followed by a concurring «infodemic». The dissemination of inaccurate and misleading information for the pandemic could cause confusion, spread fear to the general public and make more difficult the implementation of health protection measures. Thus, its restriction was attempted. States adopted emergency legislative measures to battle health disinformation. At the same time, international human rights’ organisations pointed out that these attempts could be used as an excuse and result to hamper the work of journalists and media actors and restrict public’ right to receive information. In addition, social media were considered super spreads of misinformation about the pandemic. Platform intermediaries as Facebook adopted special policies and new technical tools to filter harmful third parties’ content. States issued regulatory and co regulatory initiatives to impose to platforms to police their content more effectively. During the consultation and impact assessment procedures for these regulatory initiatives it was pointed out that imposing restrictions to internet speech can result to negative implications for freedom of speech.
According to our latest research, the global Quantum-AI Speech Recognition market size reached USD 1.87 billion in 2024, reflecting robust adoption across multiple industries. The market is projected to expand at a CAGR of 29.8% from 2025 to 2033, culminating in a forecasted market size of USD 17.36 billion by 2033. This remarkable growth is primarily driven by increasing demand for high-accuracy voice interfaces, rapid advancements in quantum computing and artificial intelligence, and the proliferation of voice-enabled devices across sectors such as healthcare, automotive, and BFSI. As per our comprehensive analysis, the convergence of quantum computing and AI technologies is setting new benchmarks for efficiency and accuracy in speech recognition systems worldwide.
One of the pivotal growth factors propelling the Quantum-AI Speech Recognition market is the exponential improvement in computational power enabled by quantum computing. Traditional AI-powered speech recognition systems, while effective, often struggle with processing large datasets in real time and accurately interpreting complex linguistic nuances. Quantum computing, with its ability to perform parallel computations and handle vast amounts of data simultaneously, is revolutionizing the landscape. By integrating quantum algorithms with AI models, organizations can achieve unprecedented levels of accuracy, speed, and contextual understanding in speech recognition, making these solutions highly attractive for mission-critical applications in healthcare diagnostics, financial transactions, and autonomous vehicles.
Another significant driver is the increasing ubiquity of voice-activated devices and virtual assistants in both consumer and enterprise environments. As businesses strive to enhance user engagement and streamline operations, the demand for seamless, hands-free interaction is surging. Quantum-AI Speech Recognition technologies offer superior performance in noisy environments, support multiple languages, and provide robust security features, addressing many of the limitations of legacy systems. This technological evolution is further supported by strategic investments from leading tech giants and startups alike, fostering a dynamic ecosystem that accelerates innovation and adoption. The market is also benefiting from favorable regulatory frameworks that encourage the integration of AI and quantum technologies in critical sectors.
Furthermore, the market is witnessing a surge in cross-industry collaborations and partnerships aimed at developing tailored solutions for diverse applications. For instance, in healthcare, Quantum-AI Speech Recognition is being leveraged to automate medical transcription, improve patient engagement, and facilitate real-time diagnostics. In the automotive sector, these technologies are enabling advanced driver-assistance systems (ADAS) and enhancing in-car infotainment experiences. The BFSI industry is utilizing quantum-enhanced speech recognition for secure voice authentication and fraud prevention. Such cross-pollination of ideas and resources is not only expanding the addressable market but also driving continuous improvements in the underlying technology.
Regionally, North America continues to dominate the Quantum-AI Speech Recognition market, owing to its strong technological infrastructure, high R&D investments, and early adoption of advanced speech technologies. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid digitization, a burgeoning tech-savvy population, and supportive government initiatives. Europe is also witnessing significant growth, particularly in sectors such as automotive and healthcare, where speech recognition is becoming integral to operational efficiency and customer experience. Overall, the global market is poised for sustained expansion, underpinned by technological advancements and a growing recognition of the transformative potential of Quantum-AI Speech Recognition across industries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Child-directed speech, as a specialized form of speech directed toward young children, has been found across numerous languages around the world and has been suggested as a universal feature of human experience. However, variation in its implementation and the extent to which it is culturally supported has called its universality into question. Child-directed speech has also been posited to be associated with expression of positive affect or “happy talk.” Here, we examined Canadian English-speaking adults' ability to discriminate child-directed from adult-directed speech samples from two dissimilar language/cultural communities; an urban Farsi-speaking population, and a rural, horticulturalist Tseltal Mayan speaking community. We also examined the relationship between participants' addressee classification and ratings of positive affect. Naive raters could successfully classify CDS in Farsi, but only trained raters were successful with the Tseltal Mayan sample. Associations with some affective ratings were found for the Farsi samples, but not reliably for happy speech. These findings point to a complex relationship between perception of affect and CDS, and context-specific effects on the ability to classify CDS across languages.
The Pitch Synchronous Segmentation (PSS) that accelerates speech without changing its fundamental frequency method could be applied and evaluated for use at NASA. There are many situations where crew or ground controllers must listen to concurrent speakers. Application of this method could improve speech comprehension and reduce workload in these situations.
Crewmembers and ground controllers often have to monitor multiple channels of speech communication during missions (multiple ground channels, onboard channels, and live speech), which can be difficult. Crew have reported that they often have to manually turn down some channels that are lower priority in order to focus on the channels of communication that they perceive to be more relevant at any point in time. We know that speech recognition is reduced when listening to concurrent speech. The U.S. Naval Research Laboratory developed a method called pitch synchronous segmentation (PSS) that accelerates speech without changing its fundamental frequency. Thus, within the same time frame, speech segments can be modified to be listened to in sequence instead of concurrently. Research shows that serial speech accelerated 50% to 65% of the speed of the original speech leads to better speech comprehension than concurrent speech at normal speed. Two key application areas were identified: Immediate accelerated playback of ground control communication with crew (to confirm dialog that was missed) Accelerated playback of recordings for troubleshooting anomalies Use of PSS for live, concurrent speech requires further research to determine the cost and benefits.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIn order to leverage the potential benefits of technology to speech and language therapy language assessment processes, large samples of naturalistic language data must be collected and analysed. These samples enable the development and testing of novel software applications with data relevant to their intended clinical application. However, the collection and analysis of such data can be costly and time-consuming. This paper describes the development of a novel application designed to elicit and analyse young children’s story retell narratives to provide metrics regarding the child’s use of grammatical structures (micro-structure) and story grammar (macro-structure elements). Key aspects for development were (1) methods to collect story retells, ensure accurate transcription and segmentation of utterances; (2) testing the reliability of the application to analyse micro-structure elements in children’s story retells and (3) development of an algorithm to analyse narrative macro-structure elements.MethodsA co-design process was used to design an app which would be used to gather story retell samples from children using mobile technology. A citizen science approach using mainstream marketing via online channels, the media and billboard ads was used to encourage participation from children across the United Kingdom. A stratified sampling framework was used to ensure a representative sample was obtained across age, gender and five bands of socio-economic disadvantage using partial postcodes and the relevant indices of deprivation. Trained Research Associates (RA) completed transcription and micro and macro-structure analysis of the language samples. Methods to improve transcriptions produced by automated speech recognition were developed to enable reliable analysis. RA micro-structure analyses were compared to those generated by the digital application to test its reliability using intra-class correlation (ICC). RA macro-structure analyses were used to train an algorithm to produce macro-structure metrics. Finally, results from the macro-structure algorithm were compared against a subset of RA macro-structure analyses not used in training to test its reliability using ICC.ResultsA total of 4,517 profiles were made in the app used in data collection and from these participants a final set of 599 were drawn which fulfilled the stratified sampling criteria. The story retells ranged from 35.66 s to 251.4 s in length and had word counts ranging from 37 to 496, with a mean of 148.29 words. ICC between the RA and application micro-structure analyses ranged from 0.213 to 1.0 with 41 out of a total of 44 comparisons reaching ‘good’ (0.70–0.90) or ‘excellent’ (>0.90) levels of reliability. ICC between the RA and application macro-structure features were completed for 85 samples not used in training the algorithm. ICC ranged from 0.5577 to 0.939 with 5 out of 7 metrics being ‘good’ or better.ConclusionWork to date has demonstrated the potential of semi-automated transcription and linguistic analyses to provide reliable, detailed and informative narrative language analysis for young children and for the use of citizen science based approaches using mobile technologies to collect representative and informative research data. Clinical evaluation of this new app is ongoing, so we do not yet have data documenting its developmental or clinical sensitivity and specificity.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
🏥 About This OTS Dataset
This Off-The-Shelf (OTS) dataset features a robust collection of audio recordings capturing real-world conversations between US agents and US patients in English across various medical scenarios. Designed to empower speech recognition systems and conversational AI tools, this dataset is ideal for training models to understand and replicate natural language used in medical interactions.
📊 Metadata Availability: Participant Insights
Each audio… See the full description on the dataset page: https://huggingface.co/datasets/Macgence/usa-agent-to-usa-patient-medical-conversation-speech-dataset-in-english.