Spanish(Spain) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(596 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This training dataset comprises more than 10,000 conversational text data between two native Bahasa people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.
These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.
These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.
This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.
This training dataset's licence belongs to FutureBeeAI!
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This training dataset comprises more than 10,000 conversational text data between two native German people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.
These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.
These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.
This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.
This training dataset's licence belongs to FutureBeeAI!
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This training dataset comprises more than 10,000 conversational text data between two native Swedish people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.
These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.
These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.
This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.
This training dataset's licence belongs to FutureBeeAI!
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This training dataset comprises more than 10,000 conversational text data between two native Urdu people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.
These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.
These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.
This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.
This training dataset's licence belongs to FutureBeeAI!
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
English(America) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety-show, etc, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied. For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1115?source=Kaggle
16kHz, 16 bit, wav, mono channel;
Including self-media, conversation, live, lecture, variety-show, etc;
Low background noise;
America(USA);
en-US;
English;
Transcription text, timestamp, speaker ID, gender.
Sentence Accuracy Rate (SAR) 95%
Commercial License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
4491 speakers participated in the recording and conducted face-to-face communication in a natural way. No topics are specified, with a wide range of fields; the voice was natural and fluent, in line with the actual dialogue scene. Text is transferred manually, with high accuracy.Format:16kHz, 16bit, uncompressed wav, mono channelEnvironments:quiet indoor environment, without echoRecording content:no topic is specified, and the speakers make dialogue while the recording is performedDemographics:4,491 speakers, 63% of which are female.Annotations:annotating for the transcription text, speaker identification and genderDevice:Android mobile phone, iPhoneLanguage:MandarinApplications:speech recognition; voiceprint recognition.Accuracy rate:97%
Arabic(UAE) Real-world Casual Conversation and Monologue speech dataset, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set consists of conversational implicatures of utterances. Conversational implicatures are the meanings of an utterance more than what is literally stated by the utterance. The data consist of 1001 utterances that come as responses in a specific context and their implicatures. These written representations of the utterances are collected manually by scraping and transcribing from relevant sources from August, 2019 to August, 2020. The source of dialogues in the data include TOEFL listening comprehension short conversations, movie dialogues from IMSDb and websites explaining idioms, similes, metaphors and hyperboles. The implicatures are annotated manually.FormattingThe dataset file (Conversational Implicature Dataset 1-1001 - implicature data 1-1001.csv) is written as comma-separated values file. Columns that contain commas (,) are escaped using double-quotes ("). The dataset is also available as an excel sheet (Conversational Implicature Dataset 1-1001.xlsx)ContentThe dataset is available in Conversational Implicature Dataset 1-1001 - implicature data 1-1001.csv. Each entry in the dataset consists of a context utterance, a response utterance and an Implicature.Context UtteranceThe written representation of an utterance which serves as the context in which the response utterance can implicate a meaning different from its literal meaning.Response UtteranceThe written representation of an utterance which has a different meaning than the meaning of the sentences used in it.ImplicatureThe implicated meaning of the response utterance.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This training dataset comprises more than 10,000 conversational text data between two native Spanish people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.
These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.
These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.
This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.
This training dataset's licence belongs to FutureBeeAI!
Dari(Afghanistan) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(452 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "SpeechRec_LanguageLearning_ConversationalSkills" dataset is a collection of data generated in a game-based language learning environment, aiming to explore the impact of Speech Recognition Technology (SRT) on the development of conversational skills. The dataset encompasses speaking test results conducted within the context of language learning games utilizing SRT.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Market Analysis The market for Conversational AI in Healthcare is projected to reach a value of XX million by 2033, growing at a CAGR of 5%. The growth is driven by the increasing adoption of AI in healthcare, the rising need for efficient patient care, and the growing prevalence of chronic diseases. Key market trends include the integration of natural language processing (NLP) and machine learning (ML) for improved communication and analysis, and the emergence of cloud-based solutions for cost-effective scalability. The major segments of the market are NLP and ML based solutions, with applications in medical record mining, medical imaging analysis, medicine development, and emergency assistance. Value Chain Analysis The Conversational AI in Healthcare market value chain consists of several players, including hardware manufacturers, software developers, solution providers, and healthcare providers. Hardware manufacturers provide the devices and sensors used for data collection and processing. Software developers create the AI algorithms and software, enabling healthcare providers to interact with patients through conversational interfaces. Solution providers integrate hardware and software to provide end-to-end solutions. Healthcare providers, including hospitals, clinics, and nursing homes, are the end-users who utilize Conversational AI solutions to enhance patient care. Key market players include Google Health, IBM Watson Health, Oncora Medical, and CloudMedX Health.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
ProsocialDialog is a large-scale, multi-turn English dialogue dataset designed to teach conversational agents how to respond to problematic content in line with social norms. It addresses a variety of unethical, biased, toxic, and generally problematic situations. The dataset is notable for its focus on encouraging prosocial behaviour, which is guided by commonsense social rules, referred to as Rules-of-Thumb (RoTs). Developed through a human-AI collaborative framework, the dataset consists of 58,000 dialogues, comprising 331,000 utterances, 160,000 unique RoTs, and 497,000 dialogue safety labels, each accompanied by free-form rationales. The test.csv
file within the ProsocialDialog dataset contains data specifically for evaluating the accuracy of a model in predicting conversation safety.
The dataset includes the following columns: * context: The context of the conversation. (String) * response: The response to the conversation. (String) * rots: Rules of thumb associated with the conversation. (String) * safety_label: The safety label associated with the conversation. (String) * safety_annotations: Annotations associated with the conversation. (String) * safety_annotation_reasons: Reasons for the safety annotations. (String) * source: The source of the conversation. (String) * etc: Any additional information associated with the conversation. (String) * dialogue_id: Unique identifier for each dialogue. * response_id: Unique identifier for each response.
The dataset is typically provided in a CSV file format, such as test.csv
. It contains 58,000 dialogues, encompassing 331,000 utterances. There are 24,972 unique dialogue IDs and 24,903 unique response IDs. The dataset includes 160,000 unique Rules-of-Thumb (RoTs) and 497,000 dialogue safety labels. Specific numbers for rows or records beyond these counts are not provided in the sources.
This dataset is ideally suited for several applications: * Designing Conversational Agents: It can be used to build Natural Language Processing (NLP) models capable of recognising and classifying problematic content. The safety labels, rationales, and RoTs can train conversational agents to respond in socially acceptable ways. * Benchmark Systems: ProsocialDialog serves as an effective benchmark for evaluating the performance of existing conversation datasets in identifying, responding to, and preventing problematic content interactions. * Automated Moderation: The dialogue safety labels and their associated free-form rationales are valuable for technology platforms implementing automated moderation tasks, such as flagging or banning offensive messages or users.
The ProsocialDialog dataset is in English and has a global regional coverage. It addresses general conversational scenarios involving social norms and problematic content, but specific demographic scope details or the precise time range of data collection are not explicitly outlined in the sources. The dataset was listed on 11/06/2025.
CCO
This dataset is beneficial for a range of users, including: * Researchers and Developers in AI and Machine Learning: Particularly those focused on Natural Language Processing (NLP) and building sophisticated conversational AI systems. * Organisations and Platforms: Especially those in need of automated moderation tools or aiming to ensure their conversational agents adhere to social norms and promote prosocial behaviour. * Academics and Students: Engaged in studying dialogue safety, social psychology, or ethical AI, who can explore the safety labels, annotations, RoTs, and data sources to gain deeper insights into human conversation dynamics.
Original Data Source: ProsocialDialog - Problematic Content Dialogue
The StudyAbroadGPT-Dataset is a collection of conversational data focused on university application requirements for various programs, including MBA, MS in Computer Science, Data Science, and Bachelor of Medicine. The dataset includes interactions between humans asking questions about application processes (e.g., "How do I write a strong SOP for MS in Data Science at MIT?") and an assistant providing detailed responses. Covering prestigious institutions such as MIT, Oxford, Cambridge, and Stanford, this dataset serves as a valuable resource for understanding the informational needs of prospective students applying to study abroad.
Dataset Structure The dataset is organized as a list of JSON objects, where each object represents a single conversation. Each conversation contains an array of turns, structured as follows:
"from": Specifies the speaker, either "human" or "assistant". "value": Contains the text of the query or response.
Example json { "conversations": [ {"from": "human", "value": "What documents do I need for applying to MBA?"}, {"from": "assistant", "value": "## Introduction To embark on your MBA journey, it's crucial to gather the necessary documents..."} ] }
Data Collection This dataset was synthetically generated to simulate realistic conversations about study abroad applications. It is designed to reflect common queries and provide detailed, informative responses related to university application requirements.
Preprocessing The dataset is provided in its raw form, consisting of unprocessed conversational text. Depending on their specific use case, such as natural language processing (NLP) tasks, users may need to perform additional preprocessing steps like tokenization or stopword removal.
Usage You can load the dataset using the Hugging Face Datasets library with the following code:
from datasets import load_dataset
dataset = load_dataset("millat/StudyAbroadGPT-Dataset")
Accessing Conversations To access and iterate through the conversations, use this example:
python for conversation in dataset["train"]: for turn in conversation["conversations"]: print(f"{turn['from']}: {turn['value']}")
Examples Here’s a sample conversation excerpt from the dataset:
json { "conversations": [ {"from": "human", "value": "What are the GMAT requirements for Oxford?"}, {"from": "assistant", "value": "## Introduction Oxford University, one of the world's most prestigious institutions, requires..."} ] }
Use Cases
Training Conversational Agents: Build chatbots to assist with university application queries. Analyzing Trends: Study application requirements across different programs and institutions. NLP Development: Create natural language understanding models tailored to educational domains.
License This dataset is licensed under the MIT License.
Citation If you use this dataset in your research, please cite it as follows:
bibtex @misc{StudyAbroadGPT-Dataset, author = {MD MILLAT HOSEN}, title = {StudyAbroadGPT-Dataset}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/datasets/millat/StudyAbroadGPT-Dataset}} }
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is for follow-up research purposes. It consists of all key documents from data collection to data analysis of the research project. See the readme document for the relevant procedure and document description.
https://www.icpsr.umich.edu/web/ICPSR/studies/37124/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37124/terms
This study investigated the presence of dynamic patterns of interpersonal coordination in extended deceptive conversations across multi-modal channels of behavior. Using a "devil's advocate" paradigm, the researchers experimentally elicited deception and truth across controversial social and political topics in which conversational partners either agreed or disagreed, and where one partner was surreptitiously asked to argue an opinion opposite of what he or she really believed. The researchers focused on interpersonal coordination as an emergent behavioral signal that captured inter-dependencies between conversational partners, both as the coupling of head movements over the span of milliseconds, measured via a windowed lagged cross correlation (WLCC) technique, and more global temporal dependencies across speech rate, using cross recurrence quantification analysis (CRQA). Another focus that was considered was how interpersonal coordination might be shaped by strategic, adaptive conversational goals associated with deception. This collection includes both qualitative transcripts and a quantitative dataset including respondent demographics (including sex, age, and ethnicity). The qualitative dataset consists of 94 written transcripts of audio-recorded conversations, lasting eight minutes each in length. The quantitative dataset includes 5 variables for 102 cases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This comparative dataset was collected during the data-gathering portion of the "ChatGPT: A Conversational Language Study Tool" project over the 2023-2024 academic year. These survey forms were completed by students in ancient language classes in the Department of Classics at the University of Reading.This project has been reviewed by the University of Reading University Research Ethics Committee and has been given a favourable ethical opinion for conduct.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This training dataset comprises more than 10,000 conversational text data between two native Bengali people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.
These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.
These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.
This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.
This training dataset's licence belongs to FutureBeeAI!
WiserBrand's Comprehensive Customer Call Transcription Dataset: Tailored Insights
WiserBrand offers a customizable dataset comprising transcribed customer call records, meticulously tailored to your specific requirements. This extensive dataset includes:
User ID and Firm Name: Identify and categorize calls by unique user IDs and company names. Call Duration: Analyze engagement levels through call lengths. Geographical Information: Detailed data on city, state, and country for regional analysis. Call Timing: Track peak interaction times with precise timestamps. Call Reason and Group: Categorised reasons for calls, helping to identify common customer issues. Device and OS Types: Information on the devices and operating systems used for technical support analysis. Transcriptions: Full-text transcriptions of each call, enabling sentiment analysis, keyword extraction, and detailed interaction reviews.
Our dataset is designed for businesses aiming to enhance customer service strategies, develop targeted marketing campaigns, and improve product support systems. Gain actionable insights into customer needs and behavior patterns with this comprehensive collection, particularly useful for Consumer Data, Consumer Behavior Data, Consumer Sentiment Data, Consumer Review Data, AI Training Data, Textual Data, and Transcription Data applications.
WiserBrand's dataset is essential for companies looking to leverage Consumer Data and B2B Marketing Data to drive their strategic initiatives in the English-speaking markets of the USA, UK, and Australia. By accessing this rich dataset, businesses can uncover trends and insights critical for improving customer engagement and satisfaction.
Cases:
Enriching STT Models: The dataset includes a wide variety of real-world customer service calls with diverse accents, tones, and terminologies. This makes it highly valuable for training speech-to-text models to better recognize different dialects, regional speech patterns, and industry-specific jargon. It could help improve accuracy in transcribing conversations in customer service, sales, or technical support.
Contextualized Speech Recognition: Given the contextual information (e.g., reasons for calls, call categories, etc.), it can help models differentiate between various types of conversations (technical support vs. sales queries), which would improve the model’s ability to transcribe in a more contextually relevant manner.
Improving TTS Systems: The transcriptions, along with their associated metadata (such as call duration, timing, and call reason), can aid in training Text-to-Speech models that mimic natural conversation patterns, including pauses, tone variation, and proper intonation. This is especially beneficial for developing conversational agents that sound more natural and human-like in their responses.
Noise and Speech Quality Handling: Real-world customer service calls often contain background noise, overlapping speech, and interruptions, which are crucial elements for training speech models to handle real-life scenarios more effectively.
Customer Interaction Simulation: The transcriptions provide a comprehensive view of real customer interactions, including common queries, complaints, and support requests. By training AI models on this data, businesses can equip their virtual agents with the ability to understand customer concerns, follow up on issues, and provide meaningful solutions, all while mimicking human-like conversational flow.
Sentiment Analysis and Emotional Intelligence: The full-text transcriptions, along with associated call metadata (e.g., reason for the call, call duration, and geographical data), allow for sentiment analysis, enabling AI agents to gauge the emotional tone of customers. This helps the agents respond appropriately, whether it’s providing reassurance during frustrating technical issues or offering solutions in a polite, empathetic manner. Such capabilities are essential for improving customer satisfaction in automated systems.
Customizable Dialogue Systems: The dataset allows for categorizing and identifying recurring call patterns and issues. This means AI agents can be trained to recognize the types of queries that come up frequently, allowing them to automate routine tasks such as ...
Spanish(Spain) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(596 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.