Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This anonymized dataset holds self-reports of 305 Dutch parents who have at least one child between 3-8 years as well as a Google Assistant-powered smart speaker at home. For more information about this study, see: https://osf.io/629b7/
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This dataset can be used to train Large Language Models such as GPT, Llama2 and Falcon, both for Fine Tuning and Domain Adaptation.
The dataset has the following specs:
The categories and intents have been selected from Bitext's collection of 20 vertical-specific datasets, covering the intents that are common across all 20 verticals. The verticals are:
For a full list of verticals and its intents see https://www.bitext.com/chatbot-verticals/.
The question/answer pairs have been generated using a hybrid methodology that uses natural texts as source text, NLP technology to extract seeds from these texts, and NLG technology to expand the seed texts. All steps in the process are curated by computational linguists.
The dataset contains an extensive amount of text data across its 'instruction' and 'response' columns. After processing and tokenizing the dataset, we've identified a total of 3.57 million tokens. This rich set of tokens is essential for training advanced LLMs for AI Conversational, AI Generative, and Question and Answering (Q&A) models.
Each entry in the dataset contains the following fields:
The categories and intents covered by the dataset are:
The entities covered by the dataset are:
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a development key figure, see questions and answers on kolada.se for more information. Number of people with Personal Assistance who have answered Everyone to the question Do your assistants understand what you are saying? divided by all people with personal assistance who have answered the question. The answer options were Everyone, Some, None. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This anonymized dataset holds answers of 305 Dutch parents who have at least one child between 3-8 years as well as a Google Assistant-powered smart speaker at home.
Facebook
TwitterIndividual Assistance (IA) is provided by the Federal Emergency Management Agency to individuals and families who have sustained losses due to disasters. Homeowners renters and business owners in designated counties who sustained damage to their homes vehicles personal property businesses or inventory as a result of a federally declared disaster may apply for disaster assistance. Disaster assistance may include grants to help pay for temporary housing emergency home repairs uninsured and underinsured personal property losses and medical dental and funeral expenses caused by the disaster
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Two annotated datasets capturing synchronized inertial and acoustic data collected from an off-the-shelf smartwatch. One dataset consists of data captured as 15 participants performed various activities of daily living in their own homes; the other dataset was compiled from 5 participants performing activities completely in-the-wild and without any supervision; ground truth was established from video evidence captured with a wearable camera. Abstract: Automatically recognizing a broad spectrum of human activities is key to realizing many compelling applications in health, personal assistance, human-computer interaction and smart environments. However, in real-world settings, approaches to human action perception have been largely constrained to detecting mobility states, e.g., walking, running, standing. In this work, we explore the use of inertial-acoustic sensing provided by off-the-shelf commodity smartwatches for detecting activities of daily living (ADLs). We conduct a semi-naturalistic study with a diverse set of 15 participants in their own homes and show that acoustic and inertial sensor data can be combined to recognize 23 activities such as writing, cooking, and cleaning with high accuracy. We further conduct a completely in-the-wild study with 5 participants to better evaluate the feasibility of our system in practical unconstrained scenarios. We comprehensively studied various baseline machine learning and deep learning models with three different fusion strategies, demonstrating the benefit of combining inertial and acoustic data for ADL recognition. Our analysis underscores the feasibility of high-performing recognition of daily activities using inertial-acoustic data from practical off-the-shelf wrist-worn devices while also uncovering challenges faced in unconstrained settings. We encourage researchers to use our public dataset to further push the boundary of ADL recognition in-the-wild. IRB approved under ID: 2016020035-MODCR01
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NLUCat
Dataset Description
Dataset Summary
NLUCat is a dataset of NLU in Catalan. It consists of nearly 12,000 instructions annotated with the most relevant intents and spans. Each instruction is accompanied, in addition, by the instructions received by the annotator who wrote it.
The intents taken into account are the habitual ones of a virtual home assistant (activity calendar, IOT, list management, leisure, etc.), but specific ones have also been added to take into account social and healthcare needs for vulnerable people (information on administrative procedures, menu and medication reminders, etc.).
The spans have been annotated with a tag describing the type of information they contain. They are fine-grained, but can be easily grouped to use them in robust systems.
The examples are not only written in Catalan, but they also take into account the geographical and cultural reality of the speakers of this language (geographic points, cultural references, etc.)
This dataset can be used to train models for intent classification, spans identification and examples generation.
This is the complete version of the dataset. A version prepared to train and evaluate intent classifiers has been published in HuggingFace.
In this repository you'll find the following items:
NLUCat_annotation_guidelines.docx: the guidelines provided to the annotation team
NLUCat_dataset.json: the completed NLUCat dataset
NLUCat_stats.tsv: statistics about de NLUCat dataset
dataset: folder with the dataset as published in HuggingFace, splited and prepared for training and evaluating intent classifiers
reports: folder with the reports done as feedback to the annotators during the annotation process
This dataset can be used for any purpose, whether academic or commercial, under the terms of the CC BY 4.0. Give appropriate credit , provide a link to the license, and indicate if changes were made.
Supported Tasks and Leaderboards
Intent classification, spans identification and examples generation.
Languages
The dataset is in Catalan (ca-ES).
Dataset Structure
Data Instances
Three JSON files, one for each split.
Data Fields
example: str. Example
annotation: dict. Annotation of the example
intent: str. Intent tag
slots: list. List of slots
Tag:str. tag to the slot
Text:str. Text of the slot
Start_char: int. First character of the span
End_char: int. Last character of the span
Example
An example looks as follows:
{ "example": "Demana una ambulància; la meva dona està de part.", "annotation": { "intent": "call_emergency", "slots": [ { "Tag": "service", "Text": "ambulància", "Start_char": 11, "End_char": 21 }, { "Tag": "situation", "Text": "la meva dona està de part", "Start_char": 23, "End_char": 48 } ] } },
Data Splits
NLUCat.train: 9128 examples
NLUCat.dev: 1441 examples
NLUCat.test: 1441 examples
Dataset Creation
Curation Rationale
We created this dataset to contribute to the development of language models in Catalan, a low-resource language.
When creating this dataset, we took into account not only the language but the entire socio-cultural reality of the Catalan-speaking population. Special consideration was also given to the needs of the vulnerable population.
Source Data
Initial Data Collection and Normalization
We commissioned a company to create fictitious examples for the creation of this dataset.
Who are the source language producers?
We commissioned the writing of the examples to the company m47 labs.
Annotations
Annotation process
The elaboration of this dataset has been done in three steps, taking as a model the process followed by the NLU-Evaluation-Data dataset, as explained in the paper.* First step: translation or elaboration of the instructions given to the annotators to write the examples.* Second step: writing the examples. This step also includes the grammatical correction and normalization of the texts.* Third step: recording the attempts and the slots of each example. In this step, some modifications were made to the annotation guides to adjust them to the real situations.
Who are the annotators?
The drafting of the examples and their annotation was entrusted to the company m47 labs through a public tender process.
Personal and Sensitive Information
No personal or sensitive information included.
The examples used for the preparation of this dataset are fictitious and, therefore, the information shown is not real.
Considerations for Using the Data
Social Impact of Dataset
We hope that this dataset will help the development of virtual assistants in Catalan, a language that is often not taken into account, and that it will especially help to improve the quality of life of people with special needs.
Discussion of Biases
When writing the examples, the annotators were asked to take into account the socio-cultural reality (geographic points, artists and cultural references, etc.) of the Catalan-speaking population.Likewise, they were asked to be careful to avoid examples that reinforce the stereotypes that exist in this society. For example: be careful with the gender or origin of personal names that are associated with certain activities.
Other Known Limitations
[N/A]
Additional Information
Dataset Curators
Language Technologies Unit at the Barcelona Supercomputing Center (langtech@bsc.es)
This work has been promoted and financed by the Generalitat de Catalunya through the Aina project.
Licensing Information
This dataset can be used for any purpose, whether academic or commercial, under the terms of the CC BY 4.0. Give appropriate credit, provide a link to the license, and indicate if changes were made.
Citation Information
DOI
Contributions
The drafting of the examples and their annotation was entrusted to the company m47 labs through a public tender process.
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The global smart speaker market, valued at $14.42 billion in 2025, is projected to experience robust growth, driven by a compound annual growth rate (CAGR) of 15.20% from 2025 to 2033. This expansion is fueled by several key factors. The increasing affordability of smart speakers, coupled with their integration into smart homes and the rising adoption of voice assistants like Alexa, Google Assistant, and Siri, are major catalysts. Consumers are drawn to the convenience and hands-free control offered by these devices for tasks ranging from playing music and setting reminders to controlling smart home appliances and accessing information. Furthermore, the continuous development of advanced features, such as improved sound quality, enhanced voice recognition capabilities, and broader platform integrations, further fuels market growth. Competition among established players like Apple, Amazon, Google, and Bose, along with emerging players from Asia, contributes to innovation and price competitiveness, making smart speakers accessible to a wider audience. However, market growth is not without challenges. Concerns surrounding data privacy and security related to voice-activated devices represent a significant restraint. Consumers are increasingly aware of the potential for data breaches and misuse of personal information collected by smart speakers, leading to hesitancy in adoption. Additionally, the market faces challenges from the saturation of the early adopter market and the need to continuously innovate to maintain consumer interest and drive further adoption beyond the existing user base. Despite these restraints, the overall outlook for the smart speaker market remains positive, driven by ongoing technological advancements, expanding applications, and increasing consumer demand for convenient and connected home experiences. The market segmentation, while not explicitly detailed, likely includes variations based on speaker size, features (e.g., multi-room audio, video capabilities), price points, and brand. Regional variations will undoubtedly reflect differing levels of technological adoption and economic development. Recent developments include: September 2023: Amazon introduced the latest iteration of its Alexa voice assistant, powered by generative AI. This enhanced version of Alexa is built upon the Alexa Large Language Model (LLM) and brings expanded functionalities to older Echo devices, including the original Echo Plus. Notably, users with Visual ID can now effortlessly start conversations with the device by merely facing it, eliminating the need for wake-up prompts., September 2023: PhonePe revealed that its SmartSpeakers gained significant popularity, with over four million devices deployed throughout India. This rapid deployment is unprecedented among offline merchants nationwide. The SmartSpeakers offered by PhonePe play a crucial role in seamlessly verifying customer payments without requiring any manual intervention. Moreover, the swift audio confirmations provided by these devices have played a pivotal role in establishing high trust and reliability among the company's 3.6 crore merchants. These merchants are spread across 19,000 postal codes in the country, making PhonePe's SmartSpeakers an invaluable asset for their businesses.. Key drivers for this market are: Growing Investments and Government Efforts to Boost Smart Homes, Increasing Consumer Demand for Smart and Connected Devices. Potential restraints include: Growing Investments and Government Efforts to Boost Smart Homes, Increasing Consumer Demand for Smart and Connected Devices. Notable trends are: Amazon Alexa is Expected to Witness Significant Growth Rate.
Facebook
Twitter
According to our latest research, the Personal Knowledge Agent on Device market size reached USD 4.1 billion in 2024 globally, demonstrating strong momentum driven by rising demand for privacy-centric, intelligent assistants. The market is projected to grow at a CAGR of 23.6% from 2025 to 2033, reaching a forecasted value of USD 32.1 billion by 2033. This robust expansion is underpinned by technological advancements in on-device AI, increasing user concerns around data privacy, and the proliferation of smart devices across consumer and enterprise environments.
The primary growth factor for the Personal Knowledge Agent on Device market is the increasing prioritization of user privacy and data security. Unlike traditional cloud-based virtual assistants, on-device personal knowledge agents process and store data locally, ensuring that sensitive information remains within the user’s control. This paradigm shift is being accelerated by stringent data protection regulations such as GDPR and CCPA, which have compelled both consumers and organizations to seek solutions that minimize data exposure and reduce compliance risks. Furthermore, the rise of edge computing and advancements in mobile hardware have enabled real-time, context-aware processing, allowing personal knowledge agents to deliver seamless, personalized experiences without compromising security or performance.
Another significant driver is the rapid adoption of smart devices and the growing integration of AI-driven functionalities across various applications. The proliferation of smartphones, wearables, smart home devices, and connected enterprise endpoints has created a fertile environment for the deployment of on-device personal knowledge agents. As these devices become increasingly sophisticated, users expect more intuitive and proactive assistance in managing schedules, automating tasks, and accessing information. This demand is further amplified by the enterprise sector, where organizations are leveraging on-device agents to enhance employee productivity, streamline workflows, and facilitate knowledge management—all while maintaining strict control over proprietary data.
The evolution of user expectations and the shift towards hyper-personalization are also fueling market growth. Modern consumers and professionals are seeking digital assistants that can adapt to their unique preferences, learn from their behaviors, and provide contextual recommendations in real time. On-device personal knowledge agents, powered by advanced machine learning and natural language processing algorithms, are uniquely positioned to meet these demands by continuously learning from user interactions without relying on cloud connectivity. This capability not only enhances user satisfaction but also opens new avenues for application in sectors such as healthcare, education, and smart homes, where personalized experiences are paramount.
From a regional perspective, North America currently dominates the Personal Knowledge Agent on Device market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. This leadership is attributed to the early adoption of AI technologies, a high concentration of tech-savvy consumers, and the presence of leading technology companies in the region. However, Asia Pacific is expected to exhibit the fastest growth over the forecast period, driven by increasing smartphone penetration, expanding digital infrastructure, and rising awareness of data privacy issues among consumers and enterprises alike. Meanwhile, regions such as Latin America and the Middle East & Africa are gradually catching up, propelled by investments in digital transformation and growing demand for secure, localized AI solutions.
The Component segment of the Personal Knowledge Agent on Device market is categorized into Software, Hardware, and Services. Software represents the core of personal knowledge agents, encompassing AI alg
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a development key figure, see questions and answers on kolada.se for more information. Number of people with Personal Assistance who have answered None to the question Do you feel safe with your assistants? divided by all people with personal assistance who have answered the question. The answer options were Everyone, Some, None. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.
Facebook
TwitterInternet users are individuals who have used the Internet (from any location) in the last 3 months. The Internet can be used via a computer, mobile phone, personal digital assistant, games machine, digital TV etc.Data source:World Bank, Creative Commons 4.0 BYhttps://data.worldbank.org/indicator/IT.NET.USER.ZS
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This anonymized dataset holds self-reports of 305 Dutch parents who have at least one child between 3-8 years as well as a Google Assistant-powered smart speaker at home. For more information about this study, see: https://osf.io/629b7/