Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The total population in France was estimated at 68.4 million people in 2024, according to the latest census figures and projections from Trading Economics. This dataset provides the latest reported value for - France Population - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
[INSEE][1] is the official french institute gathering data of many types around France. It can be demographic (Births, Deaths, Population Density...), Economic (Salary, Firms by activity / size...) and more.
It can be a great help to observe and measure inequality in the french population.
Four files are in the dataset :
name_geographic_information : give geographic data on french town (mainly latitude and longitude, but also region / department codes and names )
net_salary_per_town_per_category : salaries around french town per job categories, age and sex
population : [demographic][3] information in France per town, age, sex and living mode
departments.geojson : contains the borders of french departments. From [Gregoire David (github)][4]
These datasets can be merged by : CODGEO = code_insee
The entire dataset has been created (and actualized) by INSEE, I just uploaded it on Kaggle after doing some jobs and checks ...
Facebook
TwitterThe dataset comprises comprehensive details pertaining to real estate properties and transactions in France spanning from 2017 to 2023. With a vast compilation of 19,569,530 lines of intricate information, this extensive dataset is notably rich in content, encompassing a diverse range of essential attributes crucial for in-depth analysis of the real estate market.
lot5_surface_carrez and lot4_surface_carrez: These columns indicate the "Carrez" area of the fifth and fourth lots, respectively.
ancien_id_parcelle: It provides information about the former parcel identifier associated with the property.
lot5_numero and lot4_numero: These columns contain the numbers of the fifth and fourth lots.
numero_volume: The volume number associated with the property.
lot3_surface_carrez: The "Carrez" area of the third lot.
lot3_numero: The number of the third lot.
lot2_surface_carrez and lot2_numero: These columns represent the "Carrez" area and the number of the second lot.
lot1_surface_carrez and lot1_numero: They indicate the "Carrez" area and the number of the first lot.
surface_reelle_bati: The actual surface area of the building, in square meters.
nombre_pieces_principales: The number of main rooms in the property.
type_local and code_type_local: These columns specify the type of premises and its associated code.
adresse_numero: The property's address number.
surface_terrain: The land area, in square meters.
code_nature_culture and nature_culture: They detail the nature of the land's use, along with its corresponding code.
latitude and longitude: The latitude and longitude coordinates of the property.
valeur_fonciere: The property's land value.
code_postal: The postal code of the location.
adresse_nom_voie and adresse_code_voie: These columns specify the name and code of the street in the address.
id_parcelle: The parcel identifier associated with the property.
code_departement: The department code where the property is located.
nom_commune and code_commune: These columns indicate the name and code of the municipality of the location.
nombre_lots: The total number of lots included in the property.
nature_mutation: The nature of the real estate transaction, whether it is a sale, a donation, or other.
numero_disposition: The disposition number assigned to each transaction.
date_mutation: The date of the real estate transaction.
id_mutation: The identifier of the real estate transaction.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This French Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of French speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
The dataset features 30 Hours of dual-channel call center conversations between native French speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
This dataset can be used across a range of healthcare and voice AI use cases:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the French General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of French speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world French communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade French speech models that understand and respond to authentic French accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of French. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple French speech and language AI applications:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset provides a comprehensive exploration of contacts and interactions among 1755 participants in France in 2015, giving insights into the social behaviour of French households. With detailed information on contact locations, the gender and ages of contacts, the frequency and duration of interactions with each contact, as well as the number of people within a household, this data set covers a variety of factors which govern human interaction. By analyzing this data set we can better understand how social networks are formed among families and individuals in different communities. It is an essential guide to understanding how behaviour has changed over time and across different cultures. This dataset allows us to gain new perspectives on how various factors shape our relationships with others at home or out in society
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides a comprehensive collection of data regarding household contacts, social contact networks, and other individual characteristics in France in 2015. The data was collected by Antoine Beraud and his research team between July 2014 and February 2015.
This dataset is ideal for use as an exploratory tool to investigate how different contact factors (age, gender, location, frequency of contact) interact with each other. It can also be used to investigate variations in social contact behavior across France's cities or regions. Additionally, this dataset can be used to study the influence of certain individual characteristics (e.g., age or gender) on one's overall pattern of social contacts and household compositions.
Here are some useful tips for using this dataset: • Explore patterns such as how ages interact with frequency of contacts within your analyses
• Consider grouping participants across different metropolitian areas when studying regional variations
• Make sure to identify any outliers when looking at average values across the board
• Focus on exploring specific sections before looking at the larger picture
- Measuring the impact of different types of contact within a household such as gender, age range and frequency on the risk of infection spread in France
- Examining correlations between sociodemographic factors such as household size, geographical location and contact patterns in France.
- Analyzing how changing physical distancing restrictions affects contact patterns by comparing pre-pandemic data with current social isolation trends
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: 2015_Beraud_France_contact_common.csv | Column name | Description | |:--------------------|:--------------------------------------------------------------| | cnt_age_exact | The exact age of the contact. (Numeric) | | cnt_age_est_min | The estimated minimum age of the contact. (Numeric) | | cnt_age_est_max | The estimated maximum age of the contact. (Numeric) | | cnt_gender | The gender of the contact. (Categorical) | | cnt_home | The frequency of contact at home. (Numeric) | | cnt_work | The frequency of contact at work. (Numeric) | | cnt_school | The frequency of contact at school. (Numeric) | | cnt_transport | The frequency of contact on public transport. (Numeric) | | cnt_leisure | The frequency of contact during leisure activities. (Numeric) | | cnt_otherplace | The frequency of contact at other places. (Numeric) | | frequency_multi | The frequency of contact with multiple people. (Numeric) | | phys_contact | Whether physical contact occurred. (Categorical) | | duration_multi | The duration of contact with multiple people. (Numeric) |
**File: 2015_Beraud_France_hh_...
Facebook
TwitterFrench(France) Children Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
French(France) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(964 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1233?source=Kaggle
8kHz 8bit, a-law/u-law pcm, mono channel
Dialogue based on given topics
Low background noise (indoor)
Telephony
France(FRA)
fr-FR
French
964 people in total, 41% male and 59% female
Transcription text, timestamp, speaker ID, gender
Word accuracy rate(WAR) 98%
Commercial License
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
France's main stock market index, the FR40, rose to 8121 points on December 2, 2025, gaining 0.29% from the previous session. Over the past month, the index has climbed 0.13% and is up 11.93% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from France. France Stock Market Index (FR40) - values, historical data, forecasts and news - updated on December of 2025.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This French Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.
Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.
The dataset features 30 hours of dual-channel call center recordings between native French speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.
This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.
Such domain-rich variety ensures model generalization across common real estate support conversations.
All recordings are accompanied by precise, manually verified transcriptions in JSON format.
These transcriptions streamline ASR and NLP development for French real estate voice applications.
Detailed metadata accompanies each participant and conversation:
This enables smart filtering, dialect-focused model training, and structured dataset exploration.
This dataset is ideal for voice AI and NLP systems built for the real estate sector:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This French Call Center Speech Dataset for the BFSI (Banking, Financial Services, and Insurance) sector is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking customers. Featuring over 30 hours of real-world, unscripted audio, it offers authentic customer-agent interactions across a range of BFSI services to train robust and domain-aware ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, financial technology teams, and NLP researchers to build high-accuracy, production-ready models across BFSI customer service scenarios.
The dataset contains 30 hours of dual-channel call center recordings between native French speakers. Captured in realistic financial support settings, these conversations span diverse BFSI topics from loan enquiries and card disputes to insurance claims and investment options, providing deep contextual coverage for model training and evaluation.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world BFSI voice coverage.
This variety ensures models trained on the dataset are equipped to handle complex financial dialogues with contextual accuracy.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, making financial domain model training faster and more accurate.
Rich metadata is available for each participant and conversation:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The French Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.
This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:
This diversity ensures robust training for real-world voice assistant applications.
Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This French Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for French -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native French speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Algerian Arabic Scripted Monologue Speech Dataset for the Travel domain, a carefully constructed resource created to support the development of Arabic speech recognition technologies, particularly for applications in travel, tourism, and customer service automation.
This training dataset features 6,000+ high-quality scripted prompt recordings in Algerian Arabic, crafted to simulate real-world Travel industry conversations. It’s ideal for building robust ASR systems, virtual assistants, and customer interaction tools.
The dataset includes a wide spectrum of travel-related interactions to reflect diverse real-world scenarios:
To boost contextual realism, the scripted prompts integrate frequently encountered travel terms and variables:
Every audio file is paired with a verbatim transcription in .TXT format.
Each audio file is enriched with detailed metadata to support advanced analytics and filtering:
Facebook
TwitterThe dataset (dataset.csv) comes from a service from which anyone present on the French territory benefits without social, cultural or administrative distinction (with or without papers). Nationalities have only been inferred from individuals' last names.
The text below is based on an article from the French Observatory for Immigration and Demography entitled: The « Great Replacement »: Fantasy or Reality? The notion of « great replacement » in France now haunts editorials, social networks and major audiovisual media platforms, but places of power and simple family discussions. The importance of migratory flows, coupled with the birth rate of immigrants or of immigrant origin, resulted in 11% of the population residing in France being immigrant in 2017 and 25% being of immigrant origin - counting children of the second generation from immigration - according to figures from the French Office for Immigration and Integration (OFII) published in October 2018. This represents a quarter of the French population. And these are all stocks - that is, what is and not what will be in the future, as a result of migratory flows and future births. However, it is necessary to take into account the fertility differential between women descending from indigenous peoples (less than 1.8 children per woman on average in 2017), women descending from immigrants (2.02 children per woman on average) and immigrant women (2.73 children per woman on average). This fertility varies greatly according to the origin of the women: 3.6 children per woman on average for Algerian immigrants, 3.5 children per woman for Tunisian immigrants, 3.4 children per woman for Moroccan immigrants and 3.1 children per woman for Turkish immigrants, which is higher than the fertility of their country of origin (respectively 3; 2.4; 2.2; 2.1). Over the same twenty-year period, between 1998 and 2018: • The number of births to children with both French parents fell by 13.7%. • The number of births of children with at least one foreign parent increased by 63.6% • The number of births to children with both foreign parents increased by 43%. In 2018, almost a third of children born (31.4%) had at least one parent born abroad. While a part of the French political class remains in denial about this phenomenon and its consequences, officials in other countries source of immigration, have openly claimed this contemporary mode of conquest since the 70s: 1974, former Algerian President Houari Boumedienne said in a U.N. speech: “One day, millions of men will leave the Southern Hemisphere to go to the Northern Hemisphere. And they will not go there as friends. The wombs of our women will give us victory.” A precisely anti-France hatred is even cultivated by certain African states for which France happens to be the perfect scapegoat for the failure of their successive policies. For Algeria, this hatred even goes so far as to be included in its national anthem (cf. [Wikipedia] National anthem of Algeria).
Using the data provided, support a diagnosis on the current state and future of the French civilization. And if the replacement of the French population and its customs a fantasy or reality?
Facebook
TwitterPersons, households, and dwellings Combines data from 2009-2013; includes overseas departments
UNITS IDENTIFIED: - Dwellings: yes - Vacant Units: no - Households: yes - Individuals: yes - Group quarters: yes
UNIT DESCRIPTIONS: - Dwellings: A structure that is separate, completely enclosed by walls and partitions, without connecting with another unit unless this is by means of the shared parts of the building (corridor, staircase, lobby, etc.), and self-contained, with an entrance from which there is direct access to the outside or to the shared parts of the building, without having to go through another unit. - Households: All persons, not necessarily related, sharing the same main residence. A household can also be made up of a single person. Persons living in mobile dwellings, sailors, homeless persons, and persons living in collective dwellings are considered to be living outside households. - Group quarters: A community is a group of residential premises falling under the same managing authority and whose residents share a common mode of living. The community population includes those people who live in the community, with the exception of those who live in company accommodation. Community categories are: medium- or long-stay services of public or private health establishments; medium- and long-stay social establishments; retirement home and similar social residences; religious communities; military barracks, quarters, bases, or camps; student housing, including military teaching establishments; prisons; short-term social establishments, and other similar communities.
Residents of France, of any nationality. Does not include French citizens living in other countries, foreign tourists, or people passing through.
MICRODATA SOURCE: INSEE (Institut National de la Statisque et des Etudes Economiques)
SAMPLE SIZE (person records): 20541337.
SAMPLE DESIGN: "Rolling Census." Enumerated each year: one fifth of communes under 10,000 population (taken in their entirety); 8% of housing units sampled from communes of 10,000 or more population. Microdata are a 40% sample of persons in communes over 10,000 and a 25% sample for smaller communes. Weights are designed to describe the population in the median year of the dataset (2011).
Face-to-face [f2f]
Two separate forms, Feuille de logement and Bulletin individuel, were used to collect information on dwellings and individuals. Households in overseas departments and territories were enumerated using a slightly modified form.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COVID-19 has infected many people in France.
The dataset is no longer updated. It contains almost all French metropolitan regions plus overseas regions, updated on March 09 2020. If you want to help updating this dataset, see contributions section below.
This dataset intention is to put all published information about COVID-19 patients in France in a csv file.
Source of data: Press releases of the French regional health agencies. Data transcripted in a csv by a GitHub community.
This work is inspired by a similar work made in South Korea: kaggle dataset.
We need more contributors to build this dataset and keep it updated. Join us on GitHub.
Contributors: Lior Perez, Samia Drappeau, Manon Fourniol, Zoragna, Raphaël Presberg
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This French Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
The dataset contains 30 hours of dual-channel call center recordings between native French speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Rich metadata is available for each participant and conversation:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains an history of nearly all of the real estate transactions concerning a single house/apartment in France from 2014 to today. Some variables likely to have an impact on the price of real estate are also provided as time series: the households income levels per city, the average debt level of french peoples, the average amount of savings of french people, the interest rates of loans, the price of the rent per city, the number of housings and number of vacant housings per city.
This dataset is provided under a permissive licence, and is free to use for commercial applications. It has a vocation of helping research concerning the dynamics of real estate prices.
The dataset consists in extraction from several openly available datasets put together in a practical format: The DVF+ database of real estate transactions, the IRCOM dataset of household incomes and income taxes, average interest rates of real estate loans from the banque de france website, the LOVAC dataset of number of vacant and occupied housings per city,~~ the OECD dataset of financial assets per capita~~, the "carte des loyers" dataset of 2018 and 2022 which list the average price of the rent per square meter, the Indice de Référence des Loyers (IRL) time series which is an index defining the maximum rent increase that can be applied to an already rented housing and is calculated every 3 months as the inflation adjusted buying power of 100€ in 1998, the TEC00104 eurostat dataset of debt levels.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The total population in France was estimated at 68.4 million people in 2024, according to the latest census figures and projections from Trading Economics. This dataset provides the latest reported value for - France Population - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.