https://www.icpsr.umich.edu/web/ICPSR/studies/38417/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38417/terms
The National Couples' Health and Time Study (NCHAT) is a population-based study of couples in America that contains representative samples of racial and ethnic diverse and sexual and gender diverse individuals. NCHAT entered the field on September 1, 2020, and data collection completed in April 2021. A follow-up survey (Wave 2) was fielded in 2022. The Wave 1 sample includes 3,642 main respondents. The sample frame included adults in the United States who ranged in age from 20-60 years old, who were married or cohabiting, and who were able to read English or Spanish. About 1,515 partners participated. NCHAT sample participants were recruited through the Gallup Panel. About 9 percent of the sample was non-Latinx Black, 6 percent non-Latinx Asian, 5 percent non-Latinx Multirace, 16 percent Latinx, and 1 percent another racial or ethnic identity. Approximately 55 percent of the sample identified as heterosexual, 20 percent as gay or lesbian, 10 percent as bisexual, and 15 percent as another sexual identity or multiple sexual identities. The sample was about evenly split between men and women, and almost 3 percent identified as another gender identity. 27 percent of couples were the same gender, and 4 percent were non-binary. About 75 percent were married and the remainder were cohabiting. The average age was 45. 65 percent of the sample had no children. One-third of the sample was in an interracial couple. 10 percent were born outside the US. Survey, time diary, experience sampling method, and geospatial data were collected. NCHAT is uniquely suited to address COVID, stress, family functioning, and physical and mental health and includes an abundance of contextual and acute measures of race and racism, sexism, and heterosexism.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Here is a collective list of instruction dataset used for Neural Chat fine-tuning. The total number of instruction samples and tokens are about 1.5M and 5M respectively.
Type Language Dataset Number
HC3 en HC3 24K
dolly en databricks-dolly-15k 15K
alpaca-zh zh tigerbot-alpaca-zh-0.5m 500K
alpaca-en en TigerResearch/tigerbot-alpaca-en-50k50K
math en tigerbot-gsm-8k-en 8K
general en tigerbot-stackexchange-qa-en-0.5m 500K
OpenOrca en Open-Orca/OpenOrca 400K (sampled)… See the full description on the dataset page: https://huggingface.co/datasets/Intel/neural-chat-dataset-v2.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 12,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in Bengali Healthcare interactions. This diversity ensures the dataset accurately represents the language used by Bengali speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Bengali Healthcare interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 10,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in Dutch Healthcare interactions. This diversity ensures the dataset accurately represents the language used by Dutch speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Dutch Healthcare interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat messages, designed to
This dataset provides extra annotations on top of the publicly released Topical-Chat dataset(https://github.com/alexa/Topical-Chat) which will help in reproducing the results in our paper "Policy-Driven Neural Response Generation for Knowledge-Grounded Dialogue Systems" (https://arxiv.org/abs/2005.12529?context=cs.CL). The dataset contains 5 files: train.json, valid_freq.json, valid_rare.json, test_freq.json and test_rare.json. Each of these files will have additional annotations on top of the original Topical-Chat dataset. These specific annotations are: dialogue act annotations and knowledge sentence annotations. The annotations were computed automatically using off the shelf models which are mentioned in the README.txt
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 12,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in English Healthcare interactions. This diversity ensures the dataset accurately represents the language used by English speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to English Healthcare interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 10,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in Vietnamese Healthcare interactions. This diversity ensures the dataset accurately represents the language used by Vietnamese speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Vietnamese Healthcare interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant
A knowledge-grounded human-human conversation dataset where the underlying knowledge spans 8 broad topics and conversation partners don’t have explicitly defined roles.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 10,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in Arabic Healthcare interactions. This diversity ensures the dataset accurately represents the language used by Arabic speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Arabic Healthcare interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat messages,
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 10,000 chat conversations, each focusing on specific Real Estate related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Real Estate topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Real Estate use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in Bahasa Real Estate interactions. This diversity ensures the dataset accurately represents the language used by Bahasa speakers in Real Estate contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Bahasa Real Estate interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Real Estate customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 10,000 chat conversations, each focusing on specific Delivery & Logistics related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Delivery & Logistics topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Delivery & Logistics use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in French Delivery & Logistics interactions. This diversity ensures the dataset accurately represents the language used by French speakers in Delivery & Logistics contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to French Delivery & Logistics interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Delivery & Logistics customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
The IMAGE-CHAT dataset is a large collection of (image, style trait for speaker A, style trait for speaker B, dialogue between A & B) tuples that we collected using crowd-workers, Each dialogue consists of consecutive turns by speaker A and B. No particular constraints are placed on the kinds of utterance, only that we ask the speakers to both use the provided style trait, and to respond to the given image and dialogue history in an engaging way. The goal is not just to build a diagnostic dataset but a basis for training models that humans actually want to engage with.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 10,000 chat conversations, each focusing on specific Real Estate related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Real Estate topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Real Estate use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in Arabic Real Estate interactions. This diversity ensures the dataset accurately represents the language used by Arabic speakers in Real Estate contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Arabic Real Estate interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Real Estate customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
As of the third quarter of 2024, Nigeria was the region with the highest share of online audience using chat and messaging platforms worldwide. Around 100 percent of its internet users reported using such online communication services on a monthly basis. Morocco followed, with around 99.6 percent of its internet users using messaging apps. Approximately, 94.5 percent of the global internet audience used online chats and instant messaging platforms during the last measured quarter.
Hydrographic and Impairment Statistics (HIS) is a National Park Service (NPS) Water Resources Division (WRD) project established to track certain goals created in response to the Government Performance and Results Act of 1993 (GPRA). One water resources management goal established by the Department of the Interior under GRPA requires NPS to track the percent of its managed surface waters that are meeting Clean Water Act (CWA) water quality standards. This goal requires an accurate inventory that spatially quantifies the surface water hydrography that each bureau manages and a procedure to determine and track which waterbodies are or are not meeting water quality standards as outlined by Section 303(d) of the CWA. This project helps meet this DOI GRPA goal by inventorying and monitoring in a geographic information system for the NPS: (1) CWA 303(d) quality impaired waters and causes; and (2) hydrographic statistics based on the United States Geological Survey (USGS) National Hydrography Dataset (NHD). Hydrographic and 303(d) impairment statistics were evaluated based on a combination of 1:24,000 (NHD) and finer scale data (frequently provided by state GIS layers).
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 10,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in Spanish Healthcare interactions. This diversity ensures the dataset accurately represents the language used by Spanish speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Spanish Healthcare interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 12,000 chat conversations, each focusing on specific Travel related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Travel topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Travel use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in English Travel interactions. This diversity ensures the dataset accurately represents the language used by English speakers in Travel contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to English Travel interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Travel customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 12,000 chat conversations, each focusing on specific Telecom related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Telecom topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Telecom use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in Hindi Telecom interactions. This diversity ensures the dataset accurately represents the language used by Hindi speakers in Telecom contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Hindi Telecom interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Telecom customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
This dataset contains the scrubbed chat logs from the Southeast Atmosphere Study (SAS) project, including NOMADSS (Nitrogen, Oxidants, Mercury and Aerosol Distributions, Sources and Sinks), from May 30 - July 17, 2013. The chat logs contain conversations between scientists and other field project participants regarding data collection within the SAS-NOMADSS project.
During a 2023 survey, **** percent of responding paid search marketers from the United States stated that they believed future chat-based search advertising would not be effective at all. The remaining ** percent said it would be at least slightly effective.
https://www.icpsr.umich.edu/web/ICPSR/studies/38417/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38417/terms
The National Couples' Health and Time Study (NCHAT) is a population-based study of couples in America that contains representative samples of racial and ethnic diverse and sexual and gender diverse individuals. NCHAT entered the field on September 1, 2020, and data collection completed in April 2021. A follow-up survey (Wave 2) was fielded in 2022. The Wave 1 sample includes 3,642 main respondents. The sample frame included adults in the United States who ranged in age from 20-60 years old, who were married or cohabiting, and who were able to read English or Spanish. About 1,515 partners participated. NCHAT sample participants were recruited through the Gallup Panel. About 9 percent of the sample was non-Latinx Black, 6 percent non-Latinx Asian, 5 percent non-Latinx Multirace, 16 percent Latinx, and 1 percent another racial or ethnic identity. Approximately 55 percent of the sample identified as heterosexual, 20 percent as gay or lesbian, 10 percent as bisexual, and 15 percent as another sexual identity or multiple sexual identities. The sample was about evenly split between men and women, and almost 3 percent identified as another gender identity. 27 percent of couples were the same gender, and 4 percent were non-binary. About 75 percent were married and the remainder were cohabiting. The average age was 45. 65 percent of the sample had no children. One-third of the sample was in an interracial couple. 10 percent were born outside the US. Survey, time diary, experience sampling method, and geospatial data were collected. NCHAT is uniquely suited to address COVID, stress, family functioning, and physical and mental health and includes an abundance of contextual and acute measures of race and racism, sexism, and heterosexism.