67 datasets found
  1. T

    France Population

    • tradingeconomics.com
    • ar.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS, France Population [Dataset]. https://tradingeconomics.com/france/population
    Explore at:
    excel, csv, xml, jsonAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 1960 - Dec 31, 2024
    Area covered
    France
    Description

    The total population in France was estimated at 68.4 million people in 2024, according to the latest census figures and projections from Trading Economics. This dataset provides the latest reported value for - France Population - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  2. French employment, salaries, population per town

    • kaggle.com
    zip
    Updated Oct 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Etienne LQ (2017). French employment, salaries, population per town [Dataset]. https://www.kaggle.com/etiennelq/french-employment-by-town
    Explore at:
    zip(33132395 bytes)Available download formats
    Dataset updated
    Oct 26, 2017
    Authors
    Etienne LQ
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    French
    Description

    Context

    [INSEE][1] is the official french institute gathering data of many types around France. It can be demographic (Births, Deaths, Population Density...), Economic (Salary, Firms by activity / size...) and more.
    It can be a great help to observe and measure inequality in the french population.

    Content

    Four files are in the dataset :

    • base_etablissement_par_tranche_effectif : give information on the number of firms in every french town, categorized by size , come from [INSEE][2].
      • CODGEO : geographique code for the town (can be joined with code_insee column from "name_geographic_information.csv')
      • LIBGEO : name of the town (in french)
      • REG : region number
      • DEP : depatment number
      • E14TST : total number of firms in the town
      • E14TS0ND : number of unknown or null size firms in the town
      • E14TS1 : number of firms with 1 to 5 employees in the town
      • E14TS6 : number of firms with 6 to 9 employees in the town
      • E14TS10 : number of firms with 10 to 19 employees in the town
      • E14TS20 : number of firms with 20 to 49 employees in the town
      • E14TS50 : number of firms with 50 to 99 employees in the town
      • E14TS100 : number of firms with 100 to 199 employees in the town
      • E14TS200 : number of firms with 200 to 499 employees in the town
      • E14TS500 : number of firms with more than 500 employees in the town
    • name_geographic_information : give geographic data on french town (mainly latitude and longitude, but also region / department codes and names )

      • EU_circo : name of the European Union Circonscription
      • code_région : code of the region attached to the town
      • nom_région : name of the region attached to the town
      • chef.lieu_région : name the administrative center around the town
      • numéro_département : code of the department attached to the town
      • nom_département : name of the department attached to the town
      • préfecture : name of the local administrative division around the town
      • numéro_circonscription : number of the circumpscription
      • nom_commune : name of the town
      • codes_postaux : post-codes relative to the town
      • code_insee : unique code for the town
      • latitude : GPS latitude
      • longitude : GPS longitude
      • éloignement : i couldn't manage to figure out what was the meaning of this number
    • net_salary_per_town_per_category : salaries around french town per job categories, age and sex

      • CODGEO : unique code of the town
      • LIBGEO : name of the town
      • SNHM14 : mean net salary
      • SNHMC14 : mean net salary per hour for executive
      • SNHMP14 : mean net salary per hour for middle manager
      • SNHME14 : mean net salary per hour for employee
      • SNHMO14 : mean net salary per hour for worker
      • SNHMF14 : mean net salary for women
      • SNHMFC14 : mean net salary per hour for feminin executive
      • SNHMFP14 : mean net salary per hour for feminin middle manager
      • SNHMFE14 : mean net salary per hour for feminin employee
      • SNHMFO14 : mean net salary per hour for feminin worker
      • SNHMH14 : mean net salary for man
      • SNHMHC14 : mean net salary per hour for masculin executive
      • SNHMHP14 : mean net salary per hour for masculin middle manager
      • SNHMHE14 : mean net salary per hour for masculin employee
      • SNHMHO14 : mean net salary per hour for masculin worker
      • SNHM1814 : mean net salary per hour for 18-25 years old
      • SNHM2614 : mean net salary per hour for 26-50 years old
      • SNHM5014 : mean net salary per hour for >50 years old
      • SNHMF1814 : mean net salary per hour for women between 18-25 years old
      • SNHMF2614 : mean net salary per hour for women between 26-50 years old
      • SNHMF5014 : mean net salary per hour for women >50 years old
      • SNHMH1814 : mean net salary per hour for men between 18-25 years old
      • SNHMH2614 : mean net salary per hour for men between 26-50 years old
      • SNHMH5014 : mean net salary per hour for men >50 years old
    • population : [demographic][3] information in France per town, age, sex and living mode

      • NIVGEO : geographic level (arrondissement, communes...)
      • CODGEO : unique code for the town
      • LIBGEO : name of the town (might contain some utf-8 errors, this information has better quality name_geographic_information)
      • MOCO : cohabitation mode : [list and meaning available in Data description]
      • AGE80_17 : age category (slice of 5 years) | ex : 0 -> people between 0 and 4 years old
      • SEXE : sex, 1 for men | 2 for women
      • NB : Number of people in the category
    • departments.geojson : contains the borders of french departments. From [Gregoire David (github)][4]

    These datasets can be merged by : CODGEO = code_insee

    Acknowledgements

    The entire dataset has been created (and actualized) by INSEE, I just uploaded it on Kaggle after doing some jobs and checks ...

  3. French Real Estate Dataset (2017-2023)

    • kaggle.com
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NECHBA MOHAMMED (2023). French Real Estate Dataset (2017-2023) [Dataset]. https://www.kaggle.com/datasets/nechbamohammed/real-estate-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    NECHBA MOHAMMED
    Area covered
    French
    Description

    The dataset comprises comprehensive details pertaining to real estate properties and transactions in France spanning from 2017 to 2023. With a vast compilation of 19,569,530 lines of intricate information, this extensive dataset is notably rich in content, encompassing a diverse range of essential attributes crucial for in-depth analysis of the real estate market.

    Dataset Columns

    lot5_surface_carrez and lot4_surface_carrez: These columns indicate the "Carrez" area of the fifth and fourth lots, respectively.

    ancien_id_parcelle: It provides information about the former parcel identifier associated with the property.

    lot5_numero and lot4_numero: These columns contain the numbers of the fifth and fourth lots.

    numero_volume: The volume number associated with the property.

    lot3_surface_carrez: The "Carrez" area of the third lot.

    lot3_numero: The number of the third lot.

    lot2_surface_carrez and lot2_numero: These columns represent the "Carrez" area and the number of the second lot.

    lot1_surface_carrez and lot1_numero: They indicate the "Carrez" area and the number of the first lot.

    surface_reelle_bati: The actual surface area of the building, in square meters.

    nombre_pieces_principales: The number of main rooms in the property.

    type_local and code_type_local: These columns specify the type of premises and its associated code.

    adresse_numero: The property's address number.

    surface_terrain: The land area, in square meters.

    code_nature_culture and nature_culture: They detail the nature of the land's use, along with its corresponding code.

    latitude and longitude: The latitude and longitude coordinates of the property.

    valeur_fonciere: The property's land value.

    code_postal: The postal code of the location.

    adresse_nom_voie and adresse_code_voie: These columns specify the name and code of the street in the address.

    id_parcelle: The parcel identifier associated with the property.

    code_departement: The department code where the property is located.

    nom_commune and code_commune: These columns indicate the name and code of the municipality of the location.

    nombre_lots: The total number of lots included in the property.

    nature_mutation: The nature of the real estate transaction, whether it is a sale, a donation, or other.

    numero_disposition: The disposition number assigned to each transaction.

    date_mutation: The date of the real estate transaction.

    id_mutation: The identifier of the real estate transaction.

  4. F

    French Call Center Data for Healthcare AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-french-france
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This French Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of French speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.

    Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.

    Speech Data

    The dataset features 30 Hours of dual-channel call center conversations between native French speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.

    Participant Diversity:
    Speakers: 60 verified native French speakers from our contributor community.
    Regions: Diverse provinces across France to ensure broad dialectal representation.
    Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.
    RecordingDetails:
    Conversation Nature: Naturally flowing, unscripted conversations.
    Call Duration: Each session ranges between 5 to 15 minutes.
    Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clear conditions without background noise or echo.

    Topic Diversity

    The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgical Consultation
    Dietary Advice and Consultations
    Insurance Coverage Inquiries
    Follow-up Treatment Requests, and more
    OutboundCalls:
    Appointment Reminders
    Preventive Care Campaigns
    Test Results & Lab Reports
    Health Risk Assessment Calls
    Vaccination Updates
    Wellness Subscription Outreach, and more

    These real-world interactions help build speech models that understand healthcare domain nuances and user intent.

    Transcription

    Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.

    Transcription Includes:
    Speaker-identified Dialogues
    Time-coded Segments
    Non-speech Annotations (e.g., silence, cough)
    High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.

    Metadata

    Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.

    Participant Metadata: ID, gender, age, region, accent, and dialect.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    Usage and Applications

    This dataset can be used across a range of healthcare and voice AI use cases:

    <b style="font-weight:

  5. F

    French General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-french-france
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the French General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of French speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world French communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade French speech models that understand and respond to authentic French accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of French. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native French speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of France to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple French speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for French.
    Voice Assistants: Build smart assistants capable of understanding natural French conversations.
    <span

  6. French Social Contact Data

    • kaggle.com
    zip
    Updated Jan 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). French Social Contact Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/french-social-contact-data
    Explore at:
    zip(405866 bytes)Available download formats
    Dataset updated
    Jan 31, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    French
    Description

    French Social Contact Data

    A Study of 1755 Participants and their Household Contacts (2015)

    By [source]

    About this dataset

    This dataset provides a comprehensive exploration of contacts and interactions among 1755 participants in France in 2015, giving insights into the social behaviour of French households. With detailed information on contact locations, the gender and ages of contacts, the frequency and duration of interactions with each contact, as well as the number of people within a household, this data set covers a variety of factors which govern human interaction. By analyzing this data set we can better understand how social networks are formed among families and individuals in different communities. It is an essential guide to understanding how behaviour has changed over time and across different cultures. This dataset allows us to gain new perspectives on how various factors shape our relationships with others at home or out in society

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a comprehensive collection of data regarding household contacts, social contact networks, and other individual characteristics in France in 2015. The data was collected by Antoine Beraud and his research team between July 2014 and February 2015.

    This dataset is ideal for use as an exploratory tool to investigate how different contact factors (age, gender, location, frequency of contact) interact with each other. It can also be used to investigate variations in social contact behavior across France's cities or regions. Additionally, this dataset can be used to study the influence of certain individual characteristics (e.g., age or gender) on one's overall pattern of social contacts and household compositions.

    Here are some useful tips for using this dataset: • Explore patterns such as how ages interact with frequency of contacts within your analyses
    • Consider grouping participants across different metropolitian areas when studying regional variations
    • Make sure to identify any outliers when looking at average values across the board
    • Focus on exploring specific sections before looking at the larger picture

    Research Ideas

    • Measuring the impact of different types of contact within a household such as gender, age range and frequency on the risk of infection spread in France
    • Examining correlations between sociodemographic factors such as household size, geographical location and contact patterns in France.
    • Analyzing how changing physical distancing restrictions affects contact patterns by comparing pre-pandemic data with current social isolation trends

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: 2015_Beraud_France_contact_common.csv | Column name | Description | |:--------------------|:--------------------------------------------------------------| | cnt_age_exact | The exact age of the contact. (Numeric) | | cnt_age_est_min | The estimated minimum age of the contact. (Numeric) | | cnt_age_est_max | The estimated maximum age of the contact. (Numeric) | | cnt_gender | The gender of the contact. (Categorical) | | cnt_home | The frequency of contact at home. (Numeric) | | cnt_work | The frequency of contact at work. (Numeric) | | cnt_school | The frequency of contact at school. (Numeric) | | cnt_transport | The frequency of contact on public transport. (Numeric) | | cnt_leisure | The frequency of contact during leisure activities. (Numeric) | | cnt_otherplace | The frequency of contact at other places. (Numeric) | | frequency_multi | The frequency of contact with multiple people. (Numeric) | | phys_contact | Whether physical contact occurred. (Categorical) | | duration_multi | The duration of contact with multiple people. (Numeric) |

    **File: 2015_Beraud_France_hh_...

  7. 162 Hours - French(France) Children Real-world Casual Conversation and...

    • nexdata.ai
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 162 Hours - French(France) Children Real-world Casual Conversation and Monologue speech dataset [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1307
    Explore at:
    Dataset updated
    Aug 2, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    World, France, French
    Variables measured
    Age, Format, Country, Accuracy, Language, Content category, Language(Region) Code, Recording environment, Features of annotation
    Description

    French(France) Children Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  8. French Spontaneous Dialogue speech dataset

    • kaggle.com
    zip
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Wong (2024). French Spontaneous Dialogue speech dataset [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/french-spontaneous-dialogue-speech-dataset/versions/1
    Explore at:
    zip(113063 bytes)Available download formats
    Dataset updated
    Jun 7, 2024
    Authors
    Frank Wong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    French(France) Spontaneous Dialogue Telephony speech dataset

    Description

    French(France) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(964 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1233?source=Kaggle

    Format

    8kHz 8bit, a-law/u-law pcm, mono channel

    Content category

    Dialogue based on given topics

    Recording condition

    Low background noise (indoor)

    Recording device

    Telephony

    Country

    France(FRA)

    Language(Region) Code

    fr-FR

    Language

    French

    Speaker

    964 people in total, 41% male and 59% female

    Features of annotation

    Transcription text, timestamp, speaker ID, gender

    Accuracy rate

    Word accuracy rate(WAR) 98%

    Licensing Information

    Commercial License

  9. T

    France Stock Market Index (FR40) Data

    • tradingeconomics.com
    • pl.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). France Stock Market Index (FR40) Data [Dataset]. https://tradingeconomics.com/france/stock-market
    Explore at:
    json, xml, csv, excelAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 9, 1987 - Dec 2, 2025
    Area covered
    France
    Description

    France's main stock market index, the FR40, rose to 8121 points on December 2, 2025, gaining 0.29% from the previous session. Over the past month, the index has climbed 0.13% and is up 11.93% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from France. France Stock Market Index (FR40) - values, historical data, forecasts and news - updated on December of 2025.

  10. F

    French Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-french-france
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This French Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native French speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native French speakers from our verified contributor community.
    Regions: Representing different provinces across France to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for French real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

    <span

  11. F

    French Call Center Data for BFSI AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Call Center Data for BFSI AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/bfsi-call-center-conversation-french-france
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This French Call Center Speech Dataset for the BFSI (Banking, Financial Services, and Insurance) sector is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking customers. Featuring over 30 hours of real-world, unscripted audio, it offers authentic customer-agent interactions across a range of BFSI services to train robust and domain-aware ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI developers, financial technology teams, and NLP researchers to build high-accuracy, production-ready models across BFSI customer service scenarios.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native French speakers. Captured in realistic financial support settings, these conversations span diverse BFSI topics from loan enquiries and card disputes to insurance claims and investment options, providing deep contextual coverage for model training and evaluation.

    Participant Diversity:
    Speakers: 60 native French speakers from our verified contributor pool.
    Regions: Representing multiple provinces across France to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world BFSI voice coverage.

    Inbound Calls:
    Debit Card Block Request
    Transaction Disputes
    Loan Enquiries
    Credit Card Billing Issues
    Account Closure & Claims
    Policy Renewals & Cancellations
    Retirement & Tax Planning
    Investment Risk Queries, and more
    Outbound Calls:
    Loan & Credit Card Offers
    Customer Surveys
    EMI Reminders
    Policy Upgrades
    Insurance Follow-ups
    Investment Opportunity Calls
    Retirement Planning Reviews, and more

    This variety ensures models trained on the dataset are equipped to handle complex financial dialogues with contextual accuracy.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    30 hours-coded Segments
    Non-speech Tags (e.g., pauses, background noise)
    High transcription accuracy with word error rate < 5% due to double-layered quality checks.

    These transcriptions are production-ready, making financial domain model training faster and more accurate.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and

  12. F

    French Wake Words & Voice Commands Speech Data

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Wake Words & Voice Commands Speech Data [Dataset]. https://www.futurebeeai.com/dataset/wake-words-and-commands-dataset/wake-words-and-commands-french-france
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The French Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.

    Speech Data

    This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:

    Wake words alone
    Wake words followed by command phrases

    Participant Diversity

    Speakers: 50 native French speakers from the FutureBeeAI community
    Regions: Participants from various France provinces, ensuring broad coverage of accents and dialects
    Demographics: Ages 18–70; 60% male and 40% female participants

    Recording Details

    Type: Scripted wake words and command phrases
    Duration: 1 to 15 seconds per clip
    Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz

    Dataset Diversity

    Wake Word Types
    Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.
    Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.
    Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more
    Command Types by Use Case
    Automobile: Play music, check directions, voice search, provide feedback, and more
    Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more
    Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.
    Recording Environments
    No background noise
    Background traffic noise
    People talking in the background
    Speaking Pace
    Normal speed
    Fast speed

    This diversity ensures robust training for real-world voice assistant applications.

    Metadata

    Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.

    Participant Metadata: Unique ID, age, gender, region, accent, dialect
    Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format

    Use Cases & Applications

    Voice Assistant Activation: Train models to accurately detect and trigger based on wake words
    Smart Home Devices: Enable responsive voice control in smart appliances
    <b style="font-weight:

  13. F

    French Call Center Data for Travel AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-french-france
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This French Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for French -speaking travelers.

    Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

    Speech Data

    The dataset includes 30 hours of dual-channel audio recordings between native French speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

    Participant Diversity:
    Speakers: 60 native French contributors from our verified pool.
    Regions: Covering multiple France provinces to capture accent and dialectal variation.
    Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).
    Recording Details:
    Conversation Nature: Naturally flowing, spontaneous customer-agent calls.
    Call Duration: Between 5 and 15 minutes per session.
    Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.
    Recording Environment: Captured in controlled, noise-free, echo-free settings.

    Topic Diversity

    Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

    Inbound Calls:
    Booking Assistance
    Destination Information
    Flight Delays or Cancellations
    Support for Disabled Passengers
    Health and Safety Travel Inquiries
    Lost or Delayed Luggage, and more
    Outbound Calls:
    Promotional Travel Offers
    Customer Feedback Surveys
    Booking Confirmations
    Flight Rescheduling Alerts
    Visa Expiry Notifications, and others

    These scenarios help models understand and respond to diverse traveler needs in real-time.

    Transcription

    Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-Stamped Segments
    Non-speech Markers (e.g., pauses, coughs)
    High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

    Metadata

    Extensive metadata enriches each call and speaker for better filtering and AI training:

    Participant Metadata: ID, age, gender, region, accent, and dialect.
    Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

    Usage and Applications

    This dataset is ideal for a variety of AI use cases in the travel and tourism space:

    ASR Systems: Train French speech-to-text engines for travel platforms.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap:

  14. F

    French Scripted Monologue Speech Data in Travel Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Scripted Monologue Speech Data in Travel Domain [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/travel-scripted-speech-monologues-french-france
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Algerian Arabic Scripted Monologue Speech Dataset for the Travel domain, a carefully constructed resource created to support the development of Arabic speech recognition technologies, particularly for applications in travel, tourism, and customer service automation.

    Speech Data

    This training dataset features 6,000+ high-quality scripted prompt recordings in Algerian Arabic, crafted to simulate real-world Travel industry conversations. It’s ideal for building robust ASR systems, virtual assistants, and customer interaction tools.

    Participant Diversity
    Speakers: 60 native Algerian Arabic speakers.
    Geographic Coverage: Participants from multiple regions across Algeria to ensure rich diversity in dialects and accents.
    Demographics: Age range from 18 to 70 years, with a gender ratio of approximately 60% male and 40% female.
    Recording Details
    Prompt Type: Scripted monologue-style prompts.
    Duration: Each audio sample ranges from 5 to 30 seconds.
    Audio Format: WAV files with mono channels, 16-bit depth, and 8 kHz / 16 kHz sample rates.
    Environment: Clean, quiet, echo-free spaces to ensure high-quality recordings.

    Topic Coverage

    The dataset includes a wide spectrum of travel-related interactions to reflect diverse real-world scenarios:

    Booking and reservation dialogues
    Customer support and general inquiries
    Destination-specific guidance
    Technical and login help
    Promotional offers and travel deals
    Service availability and policy information
    Domain-specific statements

    Context Elements

    To boost contextual realism, the scripted prompts integrate frequently encountered travel terms and variables:

    Names: Common Algeria male and female names
    Addresses: Regional address formats and locality names
    Dates & Times: Booking dates, travel periods, and time-based interactions
    Destinations: Mention of cities, countries, airports, and tourist landmarks
    Prices & Numbers: Cost of flights, hotel rates, promotional discounts, etc.
    Booking & Confirmation Codes: Typical ticketing and travel identifiers

    Transcription

    Every audio file is paired with a verbatim transcription in .TXT format.

    Consistency: Each transcript matches its corresponding audio file exactly.
    Accuracy: Transcriptions are reviewed and verified by native Algerian Arabic speakers.
    Usability: File names are synced across audio and text for easy integration.

    Metadata

    Each audio file is enriched with detailed metadata to support advanced analytics and filtering:

    Participant Metadata: Unique ID, age, gender, region/state,

  15. Future of French civilization

    • kaggle.com
    zip
    Updated Oct 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anzorov Samuel (2021). Future of French civilization [Dataset]. https://www.kaggle.com/anzorovsamuel/future-of-french-civilization
    Explore at:
    zip(3178775 bytes)Available download formats
    Dataset updated
    Oct 25, 2021
    Authors
    Anzorov Samuel
    Area covered
    French
    Description

    Dataset

    The dataset (dataset.csv) comes from a service from which anyone present on the French territory benefits without social, cultural or administrative distinction (with or without papers). Nationalities have only been inferred from individuals' last names.

    Task Details

    The text below is based on an article from the French Observatory for Immigration and Demography entitled: The « Great Replacement »: Fantasy or Reality? The notion of « great replacement » in France now haunts editorials, social networks and major audiovisual media platforms, but places of power and simple family discussions. The importance of migratory flows, coupled with the birth rate of immigrants or of immigrant origin, resulted in 11% of the population residing in France being immigrant in 2017 and 25% being of immigrant origin - counting children of the second generation from immigration - according to figures from the French Office for Immigration and Integration (OFII) published in October 2018. This represents a quarter of the French population. And these are all stocks - that is, what is and not what will be in the future, as a result of migratory flows and future births. However, it is necessary to take into account the fertility differential between women descending from indigenous peoples (less than 1.8 children per woman on average in 2017), women descending from immigrants (2.02 children per woman on average) and immigrant women (2.73 children per woman on average). This fertility varies greatly according to the origin of the women: 3.6 children per woman on average for Algerian immigrants, 3.5 children per woman for Tunisian immigrants, 3.4 children per woman for Moroccan immigrants and 3.1 children per woman for Turkish immigrants, which is higher than the fertility of their country of origin (respectively 3; 2.4; 2.2; 2.1). Over the same twenty-year period, between 1998 and 2018: • The number of births to children with both French parents fell by 13.7%. • The number of births of children with at least one foreign parent increased by 63.6% • The number of births to children with both foreign parents increased by 43%. In 2018, almost a third of children born (31.4%) had at least one parent born abroad. While a part of the French political class remains in denial about this phenomenon and its consequences, officials in other countries source of immigration, have openly claimed this contemporary mode of conquest since the 70s: 1974, former Algerian President Houari Boumedienne said in a U.N. speech: “One day, millions of men will leave the Southern Hemisphere to go to the Northern Hemisphere. And they will not go there as friends. The wombs of our women will give us victory.” A precisely anti-France hatred is even cultivated by certain African states for which France happens to be the perfect scapegoat for the failure of their successive policies. For Algeria, this hatred even goes so far as to be included in its national anthem (cf. [Wikipedia] National anthem of Algeria).

    Expected Submission

    Using the data provided, support a diagnosis on the current state and future of the French civilization. And if the replacement of the French population and its customs a fantasy or reality?

    Further help

  16. w

    Population Census 2011 (cycle 2009-2013) - IPUMS Subset - France

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Aug 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INSEE (Institut National de la Statisque et des Etudes Economiques) (2025). Population Census 2011 (cycle 2009-2013) - IPUMS Subset - France [Dataset]. https://microdata.worldbank.org/index.php/catalog/6912
    Explore at:
    Dataset updated
    Aug 1, 2025
    Dataset provided by
    INSEE (Institut National de la Statisque et des Etudes Economiques)
    IPUMS
    Time period covered
    2009 - 2013
    Area covered
    France
    Description

    Analysis unit

    Persons, households, and dwellings Combines data from 2009-2013; includes overseas departments

    UNITS IDENTIFIED: - Dwellings: yes - Vacant Units: no - Households: yes - Individuals: yes - Group quarters: yes

    UNIT DESCRIPTIONS: - Dwellings: A structure that is separate, completely enclosed by walls and partitions, without connecting with another unit unless this is by means of the shared parts of the building (corridor, staircase, lobby, etc.), and self-contained, with an entrance from which there is direct access to the outside or to the shared parts of the building, without having to go through another unit. - Households: All persons, not necessarily related, sharing the same main residence. A household can also be made up of a single person. Persons living in mobile dwellings, sailors, homeless persons, and persons living in collective dwellings are considered to be living outside households. - Group quarters: A community is a group of residential premises falling under the same managing authority and whose residents share a common mode of living. The community population includes those people who live in the community, with the exception of those who live in company accommodation. Community categories are: medium- or long-stay services of public or private health establishments; medium- and long-stay social establishments; retirement home and similar social residences; religious communities; military barracks, quarters, bases, or camps; student housing, including military teaching establishments; prisons; short-term social establishments, and other similar communities.

    Universe

    Residents of France, of any nationality. Does not include French citizens living in other countries, foreign tourists, or people passing through.

    Sampling procedure

    MICRODATA SOURCE: INSEE (Institut National de la Statisque et des Etudes Economiques)

    SAMPLE SIZE (person records): 20541337.

    SAMPLE DESIGN: "Rolling Census." Enumerated each year: one fifth of communes under 10,000 population (taken in their entirety); 8% of housing units sampled from communes of 10,000 or more population. Microdata are a 40% sample of persons in communes over 10,000 and a 25% sample for smaller communes. Weights are designed to describe the population in the median year of the dataset (2011).

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Two separate forms, Feuille de logement and Bulletin individuel, were used to collect information on dwellings and individuals. Households in overseas departments and territories were enumerated using a slightly modified form.

  17. Data from: French Names

    • kaggle.com
    Updated Apr 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baptiste Pirault (2023). French Names [Dataset]. https://www.kaggle.com/datasets/batou9150/french-names
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 14, 2023
    Dataset provided by
    Kaggle
    Authors
    Baptiste Pirault
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    France, French
    Description
    • The first names file contains data on the first names given to children born in France between 1900 and 2021. This data is available at the France level and by department.
    • The names file contains data on names by decade of birth from 1891 to 2000. These data are available at the France level and by department.
  18. Data from: Coronavirus France dataset

    • kaggle.com
    zip
    Updated Mar 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lior Perez (2020). Coronavirus France dataset [Dataset]. https://www.kaggle.com/lperez/coronavirus-france-dataset
    Explore at:
    zip(9997 bytes)Available download formats
    Dataset updated
    Mar 15, 2020
    Authors
    Lior Perez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    France
    Description

    Context

    COVID-19 has infected many people in France.

    Content

    The dataset is no longer updated. It contains almost all French metropolitan regions plus overseas regions, updated on March 09 2020. If you want to help updating this dataset, see contributions section below.

    This dataset intention is to put all published information about COVID-19 patients in France in a csv file.

    Acknowledgements

    Source of data: Press releases of the French regional health agencies. Data transcripted in a csv by a GitHub community.

    This work is inspired by a similar work made in South Korea: kaggle dataset.

    Contributions

    We need more contributors to build this dataset and keep it updated. Join us on GitHub.

    Contributors: Lior Perez, Samia Drappeau, Manon Fourniol, Zoragna, Raphaël Presberg

  19. F

    French Call Center Data for Telecom AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-french-france
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This French Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native French speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.

    Participant Diversity:
    Speakers: 60 native French speakers from our verified contributor pool.
    Regions: Representing multiple provinces across France to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refund Requests and Billing Adjustments
    Emergency Service Access, and others
    Outbound Calls:
    Welcome Calls & Onboarding
    Payment Reminders
    Customer Satisfaction Surveys
    Technical Updates
    Service Usage Reviews
    Network Complaint Status Calls, and more

    This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, coughs)
    High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.

    These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.

  20. immobilier france

    • kaggle.com
    zip
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benoit Favier (2024). immobilier france [Dataset]. https://www.kaggle.com/datasets/benoitfavier/immobilier-france/discussion
    Explore at:
    zip(345875934 bytes)Available download formats
    Dataset updated
    Oct 28, 2024
    Authors
    Benoit Favier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    France
    Description

    This dataset contains an history of nearly all of the real estate transactions concerning a single house/apartment in France from 2014 to today. Some variables likely to have an impact on the price of real estate are also provided as time series: the households income levels per city, the average debt level of french peoples, the average amount of savings of french people, the interest rates of loans, the price of the rent per city, the number of housings and number of vacant housings per city.

    This dataset is provided under a permissive licence, and is free to use for commercial applications. It has a vocation of helping research concerning the dynamics of real estate prices.

    The dataset consists in extraction from several openly available datasets put together in a practical format: The DVF+ database of real estate transactions, the IRCOM dataset of household incomes and income taxes, average interest rates of real estate loans from the banque de france website, the LOVAC dataset of number of vacant and occupied housings per city,~~ the OECD dataset of financial assets per capita~~, the "carte des loyers" dataset of 2018 and 2022 which list the average price of the rent per square meter, the Indice de Référence des Loyers (IRL) time series which is an index defining the maximum rent increase that can be applied to an already rented housing and is calculated every 3 months as the inflation adjusted buying power of 100€ in 1998, the TEC00104 eurostat dataset of debt levels.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TRADING ECONOMICS, France Population [Dataset]. https://tradingeconomics.com/france/population

France Population

France Population - Historical Dataset (1960-12-31/2024-12-31)

Explore at:
excel, csv, xml, jsonAvailable download formats
Dataset authored and provided by
TRADING ECONOMICS
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Dec 31, 1960 - Dec 31, 2024
Area covered
France
Description

The total population in France was estimated at 68.4 million people in 2024, according to the latest census figures and projections from Trading Economics. This dataset provides the latest reported value for - France Population - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

Search
Clear search
Close search
Google apps
Main menu