Facebook
TwitterOver the past fifty years, the proportion of Quebecers speaking both English and French has increased steadily, from **** percent in 1971 to almost half the population (**** percent) in 2021. The rate of English-French bilingualism, on the other hand, has declined in the rest of the country: outside Quebec, just over ten percent of people were bilingual in English and French in 2001, compared to *** percent two decades later.
Facebook
TwitterData on the knowledge of official languages by the population of Canada and Canada outside Quebec, and of all provinces and territories, for Census years 1951 to 2021.
Facebook
TwitterIn 2021, French was the first language spoken by over 71 percent of the population of Montréal, Québec in Canada. 20.4 percent of the city's residents had English as their first language, 6.7 percent used both English and French as their primary language, and 1.6 percent of the population spoke another language. That same year, 46.4 percent of people living in the province of Québec could speak both English and French.
Facebook
TwitterThe statistic reflects the distribution of languages in Canada in 2022. In 2022, 87.1 percent of the total population in Canada spoke English as their native tongue.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Canadian French General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of French speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Canadian French communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade French speech models that understand and respond to authentic Canadian accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Canadian French. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple French speech and language AI applications:
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This service shows the percentage of population, excluding institutional residents, with knowledge of English and French for Canada by 2016 census subdivision. The data is from the Census Profile, Statistics Canada Catalogue no. 98-316-X2016001. Knowledge of official languages refers to whether the person can conduct a conversation in English only, French only, in both languages or in neither language. For a child who has not yet learned to speak, this includes languages that the child is learning to speak at home. For additional information refer to 'Knowledge of official languages' in the 2016 Census Dictionary. For additional information refer to 'Knowledge of official languages' in the 2016 Census Dictionary. To have a cartographic representation of the ecumene with this socio-economic indicator, it is recommended to add as the first layer, the “NRCan - 2016 population ecumene by census subdivision” web service, accessible in the data resources section below.
Facebook
TwitterIn 2021, most of the population of the city of Montreal, located in the Canadian province of Quebec, could speak both English and French. In fact, approximately 1.23 million men and 1.68 million women were bilingual. Of those who spoke only one of the official languages, the majority (1.43 million people) spoke only French. In addition, more than 68,400 people did not know either language, with women outnumbering men.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Canadian French Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.
Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.
The dataset features 30 hours of dual-channel call center recordings between native Canadian French speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.
This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.
Such domain-rich variety ensures model generalization across common real estate support conversations.
All recordings are accompanied by precise, manually verified transcriptions in JSON format.
These transcriptions streamline ASR and NLP development for French real estate voice applications.
Detailed metadata accompanies each participant and conversation:
This enables smart filtering, dialect-focused model training, and structured dataset exploration.
This dataset is ideal for voice AI and NLP systems built for the real estate sector:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Canadian French Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
The dataset contains 30 hours of dual-channel call center recordings between native Canadian French speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Rich metadata is available for each participant and conversation:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Canadian French Call Center Speech Dataset for the BFSI (Banking, Financial Services, and Insurance) sector is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking customers. Featuring over 30 hours of real-world, unscripted audio, it offers authentic customer-agent interactions across a range of BFSI services to train robust and domain-aware ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, financial technology teams, and NLP researchers to build high-accuracy, production-ready models across BFSI customer service scenarios.
The dataset contains 30 hours of dual-channel call center recordings between native Canadian French speakers. Captured in realistic financial support settings, these conversations span diverse BFSI topics from loan enquiries and card disputes to insurance claims and investment options, providing deep contextual coverage for model training and evaluation.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world BFSI voice coverage.
This variety ensures models trained on the dataset are equipped to handle complex financial dialogues with contextual accuracy.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, making financial domain model training faster and more accurate.
Rich metadata is available for each participant and conversation:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Canadian French Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.
The dataset contains 30 hours of dual-channel call center recordings between native Canadian French speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.
Such variety enhances your model’s ability to generalize across retail-specific voice interactions.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, making model training faster and more accurate.
Rich metadata is available for each participant and conversation:
This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.
This dataset is ideal for a range of voice AI and NLP applications:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Canadian French Scripted Monologue Speech Dataset for the General Domain is a carefully curated resource designed to support the development of French language speech recognition systems. This dataset focuses on general-purpose conversational topics and is ideal for a wide range of AI applications requiring natural, domain-agnostic French speech data.
This dataset features over 6,000 high-quality scripted monologue recordings in Canadian French. The prompts span diverse real-life topics commonly encountered in general conversations and are intended to help train robust and accurate speech-enabled technologies.
The dataset covers a wide variety of general conversation scenarios, including:
To enhance authenticity, the prompts include:
Each prompt is designed to reflect everyday use cases, making it suitable for developing generalized NLP and ASR solutions.
Every audio file in the dataset is accompanied by a verbatim text transcription, ensuring accurate training and evaluation of speech models.
Rich metadata is included for detailed filtering and analysis:
This dataset can power a variety of French language AI technologies, including:
Facebook
TwitterThis Alberta Official Statistic compares the knowledge of languages among the Aboriginal Identity population in provinces and territories, based on self-assessment of the ability to converse in the language. Based on the 2011 National Household Survey (NHS), English is the most common language known by the Aboriginal Identity Population across Canada. In most provinces, nearly 100% of the Aboriginal Identity population can converse in English. The lowest proportion of English-speaking Aboriginal people is in Quebec, where the majority speak French. The highest proportion of Aboriginal people who speak Aboriginal languages was in Nunavut at 88.6%, followed by Quebec (32.4%) and the Northwest Territories (32.1%). In Alberta, more Aboriginal people are able to speak Aboriginal languages (15.1%) than are able to speak French or other (non-Aboriginal) languages. The proportion of Alberta Aboriginal people able to speak Aboriginal languages was sixth highest among provinces and territories.
Facebook
TwitterAccording to the Canadian government, approximately 2.54 million people residing in Montreal, in the province of Quebec, had French as their mother tongue in 2021. About 474,730 of them had English, the second official language, as their birth language. However, there were more people that year ( 522,255) whose mother tongue was an Indo-European language, such as German, Russian or Polish.
Facebook
Twitter100% data.
Facebook
TwitterThis table contains 47 series, with data for years 1931 - 1971 (not all combinations necessarily have data for all years), and was last released on 2012-02-16. This table contains data described by the following dimensions (Not all combinations are available): Unit of measure (1 items: Persons ...) Geography (1 items: Canada ...) Mother tongue (47 items: Total languages; English; French; Baltic languages ...).
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Canadian French Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of French speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
The dataset features 30 Hours of dual-channel call center conversations between native Canadian French speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
This dataset can be used across a range of healthcare and voice AI use cases:
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains 1392 series, with data for years 1994 - 1998 (not all combinations necessarily have data for all years), and was last released on 2007-01-29. This table contains data described by the following dimensions (Not all combinations are available): Geography (29 items: Austria; Belgium (French speaking); Canada; Belgium (Flemish speaking) ...), Sex (2 items: Males; Females ...), Age groups (3 items: 11 years; 13 years;15 years ...), Student response (2 items: Yes; No ...), Family member (4 items: Mother; Father; Stepfather; Stepmother ...).
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains 1392 series, with data for years 1994 - 1998 (not all combinations necessarily have data for all years), and was last released on 2007-01-29. This table contains data described by the following dimensions (Not all combinations are available): Geography (29 items: Austria; Belgium (French speaking); Canada; Belgium (Flemish speaking) ...), Sex (2 items: Males; Females ...), Age groups (3 items: 11 years; 13 years;15 years ...), Student response (2 items: Yes; No ...), Family member (4 items: Mother; Father; Stepfather; Stepmother ...).
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the Canadian French Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of French language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.
This dataset includes over 6,000 high-quality scripted audio prompts recorded in Canadian French, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.
The prompts span a broad range of healthcare-specific interactions, such as:
To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:
These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.
Every audio recording is accompanied by a verbatim, manually verified transcription.
Facebook
TwitterOver the past fifty years, the proportion of Quebecers speaking both English and French has increased steadily, from **** percent in 1971 to almost half the population (**** percent) in 2021. The rate of English-French bilingualism, on the other hand, has declined in the rest of the country: outside Quebec, just over ten percent of people were bilingual in English and French in 2001, compared to *** percent two decades later.