20 datasets found
  1. Facts and Figures 2015: Profiles of Official Language Immigrants: French...

    • ouvert.canada.ca
    • open.canada.ca
    • +1more
    xls
    Updated Nov 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Immigration, Refugees and Citizenship Canada (2024). Facts and Figures 2015: Profiles of Official Language Immigrants: French Speaking Permanent Residents Outside Quebec [Dataset]. https://ouvert.canada.ca/data/dataset/656d603b-b07e-4f6c-9e3a-92b1d85f2d91
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    Immigration, Refugees and Citizenship Canadahttp://www.cic.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 2006 - Dec 31, 2015
    Area covered
    Québec City, Quebec, French
    Description

    Facts and Figures, Profiles of Official Language Immigrants: French Speaking Permanent Residents outside Quebec presents the annual intake of French-speaking permanent residents in Canada outside the province of Québec, by category of immigration from 2006 to 2015. The report examines selected characteristics for French-speaking permanent residents. “French-speaking immigrants” are defined by the following criteria: 1) permanent residents with French as Mother Tongue; 2) permanent residents with Mother Tongue other than French and with “French Only” as official language spoken (excluding “Both English and French” as official language spoken). Note that official language(s) spoken (English only, French only, both French and English, and neither language) are self-declared indicators of knowledge of an official language. Please note that in these datasets, the figures have been suppressed or rounded to prevent the identification of individuals when the datasets are compiled and compared with other publicly available statistics. Values between 0 and 5 are shown as “--“ and all other values are rounded to the nearest multiple of 5. This may result to the sum of the figures not equating to the totals indicated.

  2. F

    Canadian French Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Canadian French Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-french-canada
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Canada, French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Canadian French Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native Canadian French speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native Canadian French speakers from our verified contributor community.
    Regions: Representing different provinces across Canada to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for French real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  3. Percentage of population with knowledge of English and French by census...

    • datasets.ai
    • catalogue.arctic-sdi.org
    • +1more
    0, 21, 23, 52
    Updated Mar 19, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada | Statistique Canada (2019). Percentage of population with knowledge of English and French by census division, 2016 [Dataset]. https://datasets.ai/datasets/7043f8c1-d5e5-492f-8bb1-7eeac9f2a74f
    Explore at:
    52, 21, 0, 23Available download formats
    Dataset updated
    Mar 19, 2019
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Authors
    Statistics Canada | Statistique Canada
    Area covered
    French
    Description

    This service shows the percentage of population, excluding institutional residents, with knowledge of English and French for Canada by 2016 census division. The data is from the Census Profile, Statistics Canada Catalogue no. 98-316-X2016001.

    Knowledge of official languages refers to whether the person can conduct a conversation in English only, French only, in both languages or in neither language. For a child who has not yet learned to speak, this includes languages that the child is learning to speak at home. For additional information refer to 'Knowledge of official languages' in the 2016 Census Dictionary.

    For additional information refer to 'Knowledge of official languages' in the 2016 Census Dictionary.

    To have a cartographic representation of the ecumene with this socio-economic indicator, it is recommended to add as the first layer, the “NRCan - 2016 population ecumene by census division” web service, accessible in the data resources section below.

  4. F

    Canadian French Call Center Data for Telecom AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Canadian French Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-french-canada
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Canada, French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Canadian French Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Canadian French speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.

    Participant Diversity:
    Speakers: 60 native Canadian French speakers from our verified contributor pool.
    Regions: Representing multiple provinces across Canada to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refund Requests and Billing Adjustments
    Emergency Service Access, and others
    Outbound Calls:
    Welcome Calls & Onboarding
    Payment Reminders
    Customer Satisfaction Surveys
    Technical Updates
    Service Usage Reviews
    Network Complaint Status Calls, and more

    This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, coughs)
    High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.

    These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  5. F

    Canadian French Call Center Data for BFSI AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Canadian French Call Center Data for BFSI AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/bfsi-call-center-conversation-french-canada
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Canada, French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Canadian French Call Center Speech Dataset for the BFSI (Banking, Financial Services, and Insurance) sector is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking customers. Featuring over 30 hours of real-world, unscripted audio, it offers authentic customer-agent interactions across a range of BFSI services to train robust and domain-aware ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI developers, financial technology teams, and NLP researchers to build high-accuracy, production-ready models across BFSI customer service scenarios.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Canadian French speakers. Captured in realistic financial support settings, these conversations span diverse BFSI topics from loan enquiries and card disputes to insurance claims and investment options, providing deep contextual coverage for model training and evaluation.

    Participant Diversity:
    Speakers: 60 native Canadian French speakers from our verified contributor pool.
    Regions: Representing multiple provinces across Canada to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world BFSI voice coverage.

    Inbound Calls:
    Debit Card Block Request
    Transaction Disputes
    Loan Enquiries
    Credit Card Billing Issues
    Account Closure & Claims
    Policy Renewals & Cancellations
    Retirement & Tax Planning
    Investment Risk Queries, and more
    Outbound Calls:
    Loan & Credit Card Offers
    Customer Surveys
    EMI Reminders
    Policy Upgrades
    Insurance Follow-ups
    Investment Opportunity Calls
    Retirement Planning Reviews, and more

    This variety ensures models trained on the dataset are equipped to handle complex financial dialogues with contextual accuracy.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    30 hours-coded Segments
    Non-speech Tags (e.g., pauses, background noise)
    High transcription accuracy with word error rate < 5% due to double-layered quality checks.

    These transcriptions are production-ready, making financial domain model training faster and more accurate.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender,

  6. d

    Mother Tongue (French), 1996

    • datasets.ai
    • open.canada.ca
    0, 57
    Updated Sep 26, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Resources Canada | Ressources naturelles Canada (2016). Mother Tongue (French), 1996 [Dataset]. https://datasets.ai/datasets/e66df20f-8893-11e0-bf79-6cf049291510
    Explore at:
    57, 0Available download formats
    Dataset updated
    Sep 26, 2016
    Dataset authored and provided by
    Natural Resources Canada | Ressources naturelles Canada
    Area covered
    French
    Description

    This map shows the percentage of the Canadian population whose mother tongue was French. The 1996 Census defines mother tongue as the first language a person learned at home in childhood and still understood at the time of the census. The 1996 Census showed that 8.9 million Canadians could conduct a conversation in French (31%), 6.4 million spoke French most often at home (23%) and 6.7 million had French as their mother tongue (24%).

  7. F

    Canadian French TTS Speech Dataset for Speech Synthesis

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Canadian French TTS Speech Dataset for Speech Synthesis [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/tts-monolgue-french-canada
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Canada, French
    Dataset funded by
    FutureBeeAI
    Description

    The French TTS Monologue Speech Dataset is a professionally curated resource built to train realistic, expressive, and production-grade text-to-speech (TTS) systems. It contains studio-recorded long-form speech by trained native French voice artists, each contributing 1 to 2 hours of clean, uninterrupted monologue audio.

    Unlike typical prompt-based datasets with short, isolated phrases, this collection features long-form, topic-driven monologues that mirror natural human narration. It includes content types that are directly useful for real-world applications, like audiobook-style storytelling, educational lectures, health advisories, product explainers, digital how-tos, formal announcements, and more.

    All recordings are captured in professional studios using high-end equipment and under the guidance of experienced voice directors.

    Recording & Audio Quality

    Audio Format: WAV, 48 kHz, available in 16-bit, 24-bit, and 32-bit depth
    SNR: Minimum 30 dB
    Channel: Mono
    Recording Duration: 20-30 minutes
    Recording Environment: Studio-controlled, acoustically treated rooms
    Per Speaker Volume: 1–2 hours of speech per artist
    Quality Control: Each file is reviewed and cleaned for common acoustic issues, including: reverberation, lip smacks, mouth clicks, thumping, hissing, plosives, sibilance, background noise, static interference, clipping, and other artifacts.

    Only clean, production-grade audio makes it into the final dataset.

    Voice Artist Selection

    All voice artists are native French speakers with professional training or prior experience in narration. We ensure a diverse pool in terms of age, gender, and region to bring a balanced and rich vocal dataset.

    Artist Profile:
    Gender: Male and Female
    Age Range: 20–60 years
    Regions: Native French-speaking states from Canada
    Selection Process: All artists are screened, onboarded, and sample-approved using FutureBeeAI’s proprietary Yugo platform.

    Script Quality & Coverage

    Scripts are not generic or repetitive. Scripts are professionally authored by domain experts to reflect real-world use cases. They avoid redundancy and include modern vocabulary, emotional range, and phonetically rich sentence structures.

    Word Count per Script: 3,000–5,000 words per 30-minute session
    Content Types:
    Storytelling
    Script and book reading
    Informational explainers
    Government service instructions
    E-commerce tutorials
    Motivational content
    Health & wellness guides
    Education & career advice
    Linguistic Design: Balanced punctuation, emotional range, modern syntax, and vocabulary diversity

    Transcripts & Alignment

    While the script is used during the recording, we also provide post-recording updates to ensure the transcript reflects the final spoken audio. Minor edits are made to adjust for skipped or rephrased words.

    Segmentation: Time-stamped at the sentence level, aligned to actual spoken delivery
    Format: Available in plain text and JSON
    Post-processing:
    Corrected for disfluencies
    <div

  8. q

    2016. English Spoken at Home, French Spoken at Home, Aboriginal Language...

    • desq.quescren.ca
    Updated Mar 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). 2016. English Spoken at Home, French Spoken at Home, Aboriginal Language Spoken at Home, Immigrant Language Spoken at Home, Mother Tongue, Age and Sex for the Population Excluding Institutional Residents of Canada, Provinces and Territories, Census Metropolitan Areas and Census Agglomerations - Dataset - Data Portal on English-Speaking Quebec [Dataset]. https://desq.quescren.ca/dataset/chssn-2016-98-400-x2016344
    Explore at:
    Dataset updated
    Mar 30, 2024
    Area covered
    Quebec, Canada, French
    Description

    100% data.

  9. a

    Knowledge of Language of Aboriginal Identity Population, Canada, Provinces...

    • open.alberta.ca
    Updated May 28, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Knowledge of Language of Aboriginal Identity Population, Canada, Provinces and Territories - Open Government [Dataset]. https://open.alberta.ca/dataset/knowledge-of-language-of-aboriginal-identity-population-canada-provinces-and-territories
    Explore at:
    Dataset updated
    May 28, 2013
    Area covered
    Canada
    Description

    This Alberta Official Statistic compares the knowledge of languages among the Aboriginal Identity population in provinces and territories, based on self-assessment of the ability to converse in the language. Based on the 2011 National Household Survey (NHS), English is the most common language known by the Aboriginal Identity Population across Canada. In most provinces, nearly 100% of the Aboriginal Identity population can converse in English. The lowest proportion of English-speaking Aboriginal people is in Quebec, where the majority speak French. The highest proportion of Aboriginal people who speak Aboriginal languages was in Nunavut at 88.6%, followed by Quebec (32.4%) and the Northwest Territories (32.1%). In Alberta, more Aboriginal people are able to speak Aboriginal languages (15.1%) than are able to speak French or other (non-Aboriginal) languages. The proportion of Alberta Aboriginal people able to speak Aboriginal languages was sixth highest among provinces and territories.

  10. u

    Facts and Figures 2015: Profiles of Official Language Immigrants: English...

    • data.urbandatacentre.ca
    Updated Oct 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Facts and Figures 2015: Profiles of Official Language Immigrants: English Speaking Permanent Residents inside Quebec - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-caa61377-f34c-4f31-89ae-a57c8a73f99d
    Explore at:
    Dataset updated
    Oct 19, 2025
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Quebec, Canada
    Description

    "Facts and Figures, Profiles of Official Language Immigrants: English Speaking Permanent Residents in Quebec presents the annual intake of English-speaking permanent residents in the province of Quebec by category of immigration from 2006 to 2015. The report examines selected characteristics for English-speaking permanent residents. “English-speaking immigrants” are defined by the following criteria: 1) permanent residents with English as Mother Tongue; 2) permanent residents with Mother Tongue other than English and with “English Only” as official language spoken (excluding “Both English and French” as official language spoken). Note that official language(s) spoken (English only, French only, both French and English, and neither language) are self-declared indicators of knowledge of an official language. Please note that in these datasets, the figures have been suppressed or rounded to prevent the identification of individuals when the datasets are compiled and compared with other publicly available statistics. Values between 0 and 5 are shown as “--“ and all other values are rounded to the nearest multiple of 5. This may result to the sum of the figures not equating to the totals indicated. "

  11. Student response to question: Which of these people live at your home...

    • data.wu.ac.at
    • www150.statcan.gc.ca
    • +2more
    csv, html, xml
    Updated Jul 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada | Statistique Canada (2018). Student response to question: Which of these people live at your home (answers are for the home where they live most of the time), by sex, age group and selected countries [Dataset]. https://data.wu.ac.at/schema/www_data_gc_ca/YWJhNGIwNTUtNmY2ZS00MTIyLTgwMGYtNDQyNDc2YTk2ZTc4
    Explore at:
    html, csv, xmlAvailable download formats
    Dataset updated
    Jul 26, 2018
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    This table contains 1392 series, with data for years 1994 - 1998 (not all combinations necessarily have data for all years), and was last released on 2007-01-29. This table contains data described by the following dimensions (Not all combinations are available): Geography (29 items: Austria; Belgium (French speaking); Canada; Belgium (Flemish speaking) ...), Sex (2 items: Males; Females ...), Age groups (3 items: 11 years; 13 years;15 years ...), Student response (2 items: Yes; No ...), Family member (4 items: Mother; Father; Stepfather; Stepmother ...).

  12. F

    Canadian French Scripted Monologue Speech Data for Telecom

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Canadian French Scripted Monologue Speech Data for Telecom [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/telecom-scripted-speech-monologues-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Canada, French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Presenting the Canadian French Scripted Monologue Speech Dataset for the Telecom Domain, a purpose-built dataset created to accelerate the development of French speech recognition and voice AI models specifically tailored for the telecommunications industry.

    Speech Data

    This dataset includes over 6,000 high-quality scripted prompt recordings in Canadian French, representing real-world telecom customer service scenarios. It’s designed to support the training of speech-based AI systems used in call centers, virtual agents, and voice-powered support tools.

    Participant Diversity
    Speakers: 60 native Canadian French speakers
    Geographic Distribution: Carefully selected from multiple regions across Canada to capture a wide spectrum of dialects and speaking styles
    Demographics: Balanced representation of males and females (60:40 ratio), aged between 18 to 70 years
    Recording Specifications
    Type: Scripted monologue prompts focused on telecom industry use cases
    Duration: Each audio clip ranges from 5 to 30 seconds
    Format: WAV files in mono, 16-bit depth, with sample rates of 8 kHz and 16 kHz
    Environment: Clean, echo-free, and noise-controlled settings to ensure optimal audio clarity

    Topic Coverage

    The dataset reflects a wide variety of common telecom customer interactions, including:

    Customer onboarding and service inquiries
    Billing and payment questions
    Data plans and product information
    Technical support requests
    Network coverage discussions
    Regulatory compliance and policy information
    Upgrades, renewals, and service plan changes
    Domain-specific scripted interactions tailored to real-world telecom use cases

    Contextual Depth

    To maximize contextual richness, prompts include:

    Localized Names: Common Canada names in various formats
    Addresses: Region-specific address structures for realism
    Dates & Times: Spoken date and time references in typical telecom scenarios (e.g., billing cycles, service activation times)
    Telecom Terminology: Keywords related to mobile data, network, SIM, devices, plans, etc.
    Numbers & Rates: Usage statistics, pricing info, recharge values, and billing figures
    Service Providers: References to telecom companies and third-party service entities

    Transcription

    Each audio file is paired with an accurate, verbatim transcription for precise model training:

    Content: Transcriptions are direct representations of each recorded prompt
    Format: Plain text (.TXT), with filenames matching their corresponding audio files
    Verification: Every transcription is manually verified by native Canadian French linguists to ensure consistency and accuracy

    Metadata

    Detailed metadata is included to

  13. Number of students in official languages programs, public elementary and...

    • www150.statcan.gc.ca
    • data.urbandatacentre.ca
    • +2more
    Updated Oct 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2025). Number of students in official languages programs, public elementary and secondary schools, by program type, grade and sex [Dataset]. http://doi.org/10.25318/3710000901-eng
    Explore at:
    Dataset updated
    Oct 28, 2025
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Enrolments in regular second language programs (or core language programs), French immersion programs, and education programs in the minority official language offered in public elementary and secondary schools, by type of program, grade and sex.

  14. F

    Canadian French Scripted Monologue Speech Data in Real Estate

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Canadian French Scripted Monologue Speech Data in Real Estate [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/realestate-scripted-speech-monologues-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Canada, French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Introducing the Canadian French Scripted Monologue Speech Dataset for the Real Estate Domain, a dataset designed to support the development of French speech recognition and conversational AI technologies tailored for the real estate industry.

    Speech Data

    This dataset includes over 6,000 high-quality scripted prompt recordings in Canadian French. The speech content reflects a wide range of real estate interactions to help build intelligent, domain-specific customer support systems and speech-enabled tools.

    Participant Diversity
    Speakers: 60 native French speakers from across Canada
    Regional Variation: Balanced representation of regional dialects and speaking styles
    Demographics: Ages 18–70, with a 60:40 male-to-female ratio
    Recording Specifications
    Type: Scripted monologue recordings
    Duration: 5–30 seconds per audio clip
    Audio Format: WAV, mono channel, 16-bit, sampled at 8 kHz and 16 kHz
    Recording Environment: Quiet, echo-free settings with no background noise

    Topic and Scenario Coverage

    This dataset captures a broad spectrum of use cases and conversational themes within the real estate sector, such as:

    Property inquiries and viewing appointments
    Price negotiations and financial discussions
    Contractual and legal clarifications
    Relocation coordination and service support
    Real estate agent interactions
    Regulatory information and buyer/seller advisory
    Domain-specific spoken statements and service dialogues

    Contextual Depth

    Each scripted prompt incorporates key elements to simulate realistic real estate conversations:

    Names: Culturally appropriate Canada names in various spoken formats
    Addresses: Detailed location references, including cities, districts, and street names
    Dates & Times: Contextual references to appointments, contract timelines, or move-in dates
    Property Descriptions: Features, measurements, and amenities of real estate listings
    Financial Details: Prices, rental amounts, down payments, deposits, and loan-related figures
    Legal Terms: Frequently used terms in property contracts and documentation

    Transcription

    To ensure precision in model training, each audio recording is paired with a verbatim text transcription:

    Content: Exact scripted text for each corresponding audio prompt
    Format: Plain text (.TXT) files named to match their associated audio recordings
    Quality Control: All transcriptions are manually reviewed by native Canadian French linguists for consistency and correctness

    Metadata

    Each data sample is enriched with detailed metadata to enhance usability:

    Participant Metadata: <span

  15. u

    Percent official language speakers by municipality - Catalogue - Canadian...

    • data.urbandatacentre.ca
    Updated Oct 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Percent official language speakers by municipality - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-e669ec7d-bb3c-465f-ab77-3d2a58d93461
    Explore at:
    Dataset updated
    Oct 19, 2025
    Description

    Refers to the percentage of individuals most often speaking at home at least one of English or French at the time of the census

  16. F

    Canadian French Wake Words & Voice Commands Speech Data

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Canadian French Wake Words & Voice Commands Speech Data [Dataset]. https://www.futurebeeai.com/dataset/wake-words-and-commands-dataset/wake-words-and-commands-french-canada
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Canada, French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Canadian French Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.

    Speech Data

    This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:

    Wake words alone
    Wake words followed by command phrases

    Participant Diversity

    Speakers: 50 native Canadian French speakers from the FutureBeeAI community
    Regions: Participants from various Canada provinces, ensuring broad coverage of accents and dialects
    Demographics: Ages 18–70; 60% male and 40% female participants

    Recording Details

    Type: Scripted wake words and command phrases
    Duration: 1 to 15 seconds per clip
    Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz

    Dataset Diversity

    Wake Word Types
    Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.
    Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.
    Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more
    Command Types by Use Case
    Automobile: Play music, check directions, voice search, provide feedback, and more
    Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more
    Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.
    Recording Environments
    No background noise
    Background traffic noise
    People talking in the background
    Speaking Pace
    Normal speed
    Fast speed

    This diversity ensures robust training for real-world voice assistant applications.

    Metadata

    Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.

    Participant Metadata: Unique ID, age, gender, region, accent, dialect
    Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format

    Use Cases & Applications

    Voice Assistant Activation: Train models to accurately detect and trigger based on wake words
    Smart Home Devices: Enable responsive voice control in smart appliances
    <b

  17. G

    Canadian Armed Forces Regular Force Francophone and Anglophone Officers and...

    • open.canada.ca
    csv
    Updated Mar 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Defence (2025). Canadian Armed Forces Regular Force Francophone and Anglophone Officers and NCMs [Dataset]. https://open.canada.ca/data/en/dataset/b579bb2a-8799-49d9-9aa4-dd55b8ccecf1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    National Defence
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Apr 1, 1997 - Mar 31, 2024
    Area covered
    Canada
    Description

    This dataset represents the number of Anglophone and Francophone Canadian Armed Forces (CAF) Regular Force members by Officers and Non-Commissioned Members from 1997 to 2022. Military Personnel Command (MPC) supports the requirement to release accurate and timely information to Canadians, in line with the principles of Open Government. MPC has made every attempt to ensure the accuracy and reliability of the information provided. However, data contained within this report may also appear in historic, current and future reports of a similar nature where it may be represented differently, and in some cases appear to be in conflict with the current report. MPC assumes no responsibility, or liability, for any errors or omissions in the content of this publication. The Commander of Military Personnel Command (MILPERSCOM) is also appointed as the Chief of Military Personnel (CMP).

  18. B

    Canadian Gallup Poll, May 1961, #288

    • borealisdata.ca
    • dataone.org
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gallup Canada (2023). Canadian Gallup Poll, May 1961, #288 [Dataset]. http://doi.org/10.5683/SP2/ERNKPC
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    Borealis
    Authors
    Gallup Canada
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Canada
    Description

    This Gallup poll seeks the opinions of Canadians. The primary subject of this survey is politics, with the questions focussing on politicians and political parties, as well as other issues of political importance to both Canada, and other countries. Respondents were also asked questions so that they could be grouped according to geographic, demographic and social groups. Topics of interest include: Adolf Eichmann's trial in Israel; concentration camps; the Conservative party's majority; federal elections; friendliness towards people from Germany and Japan; mandatory English classes in French speaking provinces; mandatory French classes in English speaking provinces; Kennedy's performance as American President; major problems facing the government; nuclear weapons testing, and the possiblity of nuclear war; the Peace Corps; preferred political parties; religion being taught in schools; unemployment; union membership; voting behaviour; and whether Western Canada is more friendly than the rest of Canada. Basic demographics variables are also included.

  19. u

    Official Languages Health Program Call for Proposals 2019-2022:...

    • data.urbandatacentre.ca
    Updated Oct 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Official Languages Health Program Call for Proposals 2019-2022: Micro-Funding to Improve Access to Health Services for Official Language Minority Communities - Applicant Guide - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-29942ce0-c87c-4ee2-8d02-f9f49ef77033
    Explore at:
    Dataset updated
    Oct 19, 2025
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    This call for proposals aims to fund projects that improve access to health services for members of French-speaking communities outside Québec and English-speaking communities in Québec (known as official language minority communities - OLMCs).

  20. G

    Population by language by municipality

    • open.canada.ca
    • open.alberta.ca
    csv, html, json, xlsx +1
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Alberta (2024). Population by language by municipality [Dataset]. https://open.canada.ca/data/dataset/4666e9fe-532c-44ae-ba40-ded754218c12
    Explore at:
    xlsx, csv, html, json, xmlAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Government of Alberta
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Population that speaks an official language (English or French) as the primary language in the home expressed as a percentage of the total population.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Immigration, Refugees and Citizenship Canada (2024). Facts and Figures 2015: Profiles of Official Language Immigrants: French Speaking Permanent Residents Outside Quebec [Dataset]. https://ouvert.canada.ca/data/dataset/656d603b-b07e-4f6c-9e3a-92b1d85f2d91
Organization logo

Facts and Figures 2015: Profiles of Official Language Immigrants: French Speaking Permanent Residents Outside Quebec

Explore at:
xlsAvailable download formats
Dataset updated
Nov 22, 2024
Dataset provided by
Immigration, Refugees and Citizenship Canadahttp://www.cic.gc.ca/
License

Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically

Time period covered
Jan 1, 2006 - Dec 31, 2015
Area covered
Québec City, Quebec, French
Description

Facts and Figures, Profiles of Official Language Immigrants: French Speaking Permanent Residents outside Quebec presents the annual intake of French-speaking permanent residents in Canada outside the province of Québec, by category of immigration from 2006 to 2015. The report examines selected characteristics for French-speaking permanent residents. “French-speaking immigrants” are defined by the following criteria: 1) permanent residents with French as Mother Tongue; 2) permanent residents with Mother Tongue other than French and with “French Only” as official language spoken (excluding “Both English and French” as official language spoken). Note that official language(s) spoken (English only, French only, both French and English, and neither language) are self-declared indicators of knowledge of an official language. Please note that in these datasets, the figures have been suppressed or rounded to prevent the identification of individuals when the datasets are compiled and compared with other publicly available statistics. Values between 0 and 5 are shown as “--“ and all other values are rounded to the nearest multiple of 5. This may result to the sum of the figures not equating to the totals indicated.

Search
Clear search
Close search
Google apps
Main menu