French(France) Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering general category; human-machine interaction category. Transcribed with text content. Our dataset was collected from extensive and diversify speakers(1623 native speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
French(Canada) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(126 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Ce jeu de données est issu d'une fusion-hybridation des jeux de données Arcep et ANFR sur les sites 5G, ces deux organisations présentant chacune des informations partielles et ne synchronisant pas leurs publications https://www.data.gouv.fr/fr/datasets/fichier-complet-des-sites-mobiles-5g/
This dataset provides information on 4,598 in France as of June, 2025. It includes details such as email addresses (where publicly available), phone numbers (where publicly available), and geocoded addresses. Explore market trends, identify potential business partners, and gain valuable insights into the industry. Download a complimentary sample of 10 records to see what's included.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the French Newspaper, Books, and Magazine Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the French language.
Dataset Contain & Diversity:Containing a total of 5000 images, this French OCR dataset offers an equal distribution across newspapers, books, and magazines. Within, you'll find a diverse collection of content, including articles, advertisements, cover pages, headlines, call outs, and author sections from a variety of newspapers, books, and magazines. Images in this dataset showcases distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personal identifiable information (PII), and in each image a minimum of 80% space is contain visible French text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, further enhancing dataset diversity. The collection features images in portrait and landscape modes.
All these images were captured by native French people to ensure the text quality, avoid toxic content and PII text. We used latest iOS and android mobile devices above 5MP camera to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:Along with the image data you will also receive detailed structured metadata in CSV format. For each image it includes metadata like device information, source type like newspaper, magazine or book image, and image type like portrait or landscape etc. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of French text recognition models.
Update & Custom Collection:We're committed to expanding this dataset by continuously adding more images with the assistance of our native French crowd community.
If you require a custom dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific requirements using our crowd community.
License:This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Leverage the power of this image dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the French language. Your journey to enhanced language understanding and processing starts here.
Leverage the most reliable and compliant mobile device location/foot traffic dataset on the market.
Veraset Movement (Mobile Location Data) offers unparalleled insights into footfall traffic patterns across dozens of European countries.
Covering 45+ European countries, Veraset's Mobile Location Data draws on raw GPS data from tier-1 apps, SDKs, and aggregators of mobile devices to provide customers with accurate, up-to-the-minute information on human movement. Ideal for ad tech, planning, retail, and transportation logistics, Veraset's Movement data helps shape strategy and make impactful data-driven decisions.
Veraset’s European Movement Panel includes the following countries: - United Kingdom-GB - Germany-DE - France-FR - Spain-ES - Italy-IT - The Netherlands-NL - Switzerland-CH - Belgium-BE - Sweden-SE - Austria-AT - Denmark-DK - Finland-FI - Cyprus-CY - Poland-PL - Ireland-IE - Portugal-PT - Romania-RO - Hungary-HU - Czech Republic-CZ - Greece-GR - Bulgaria-BG - Lithuania-LT - Croatia-HR - Norway-NO - Latvia-LV - Luxembourg-LU - Slovakia-SK - Estonia-EE - Cayman Islands-KY - Slovenia-SI - Vatican city-VA - Turks and Caicos Islands-TC - Bermuda-BM - Malta-MT - Iceland-IS - Liechtenstein-LI - Monaco-MC - British Virgin Islands-VG - Anguilla-AI - Andorra-AD - Greenland-GL - San Marino-SM - Federated States of Micronesia-FM - Montserrat-MS - Pitcairn islands-PN
Common Use Cases of Veraset's Mobile Location Data: - Advertising - Ad Placement, Attribution, and Segmentation - Audience Creation/Building - Dynamic Ad Targeting - Infrastructure Plans - Route Optimization - Public Transit Optimization - Credit Card Loyalty - Competitive Analysis - Risk assessment, Underwriting, and Policy Personalization - Enrichment of Existing Datasets - Trade Area Analysis - Predictive Analytics and Trend Forecasting
This dataset provides information on 3,931 in France as of May, 2025. It includes details such as email addresses (where publicly available), phone numbers (where publicly available), and geocoded addresses. Explore market trends, identify potential business partners, and gain valuable insights into the industry. Download a complimentary sample of 10 records to see what's included.
Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
License information was derived automatically
Ce jeu de données recense et localise la dernière version des antennes mobiles 5G des différents opérateurs et les bandes de fréquences disponibles sur ces sites en France Métropolitaine. Enrichissementajout des hiérarchies administratives.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Release Date: 17.01.22
Welcome to Common Phone 1.0
Legal Information
Common Phone is a subset of the Common Voice corpus collected by Mozilla Corporation. By using Common Phone, you agree to the Common Voice Legal Terms. Common Phone is maintained and distributed by speech researchers at the Pattern Recognition Lab of Friedrich-Alexander-University Erlangen-Nuremberg (FAU) under the CC0 license.
Like for Common Voice, you must not make any attempt to identify speakers that contributed to Common Phone.
About Common Phone
This corpus aims to provide a basis for Machine Learning (ML) researchers and enthusiasts to train and test their models against a wide variety of speakers, hardware/software ecosystems and acoustic conditions to improve generalization and availability of ML in real-world speech applications.
The current version of Common Phone comprises 116,5 hours of speech samples, collected from 11.246 speakers in 6 languages:
Language |
Speakers |
Hours |
---|---|---|
|
| |
English |
4716 / 771 / 774 |
14.1 / 2.3 / 2.3 |
French |
796 / 138 / 135 |
13.6 / 2.3 / 2.2 |
German |
1176 / 202 / 206 |
14.5 / 2.5 / 2.6 |
Italian |
1031 / 176 / 178 |
14.6 / 2.5 / 2.5 |
Spanish |
508 / 88 / 91 |
16.5 / 3.0 / 3.1 |
Russian |
190 / 34 / 36 |
12.7 / 2.6 / 2.8 |
Total |
8417 / 1409 / 1420 |
85.8 / 15.2 / 15.5 |
Presented train
, dev
and test
splits are not identical to those shipped with Common Voice. Speaker separation among splits was realized by only using those speakers that had provided age and gender information. This information can only be provided as a registered user on the website. When logged in, the session ID of contributed recordings is always linked to your user, thus we could easily link recordings to individual speakers. Keep in mind this would not be possible for unregistered users, as their session ID changes if they decide to contribute more than once.
During speaker selection, we considered that some speakers had contributed to more than one of the six Common Voice datasets (one for each language). In Common Phone, a speaker will only appear in one language.
The dataset is structured as follows:
Where does the phonetic annotation come from?
Phonetic annotation was computed via BAS Web Services. We used the regular Pipeline (G2P-MAUS) without ASR to create an alignment of text transcripts with audio signals. We chose International Phonetic Alphabet (IPA) output symbols as they work well even in a multi-lingual setup. Common Phone annotation comprises 101 phonetic symbols, including silence.
Why Common Phone?
Is there any publication available?
Yes, a paper describing Common Phone in detail is currently under revision for LREC 2022. You can access a pre-print version on arXiv entitled “Common Phone: A Multilingual Dataset for Robust Acoustic Modelling”.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the French Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the French language.
Dataset Contain & Diversity:Containing a total of 2000 images, this French OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible French text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native French people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of French text recognition models.
Update & Custom Collection:We're committed to expanding this dataset by continuously adding more images with the assistance of our native French crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the French language. Your journey to enhanced language understanding and processing starts here.
This dataset contains GPS tracks of mobile phones users.
English(France) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,089 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
License information was derived automatically
Ce jeu de données recense et localise la dernière version des antennes mobiles 5G des différents opérateurs et les bandes de fréquences disponibles sur ces sites en France Métropolitaine. Enrichissementajout des hiérarchies administratives.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Mobile Phones Market Size Volume in France, 2023 Discover more data with ReportLinker!
Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
License information was derived automatically
Ce jeu de données recense et localise la dernière version des antennes mobiles des quatre opérateurs et la disponibilité des différentes technologies (2G, 3G, 4G) sur ces sites en France Métropolitaine et Outre-Mer. Attention, des erreurs ont été détectés sur les identifiants des sites (id sous forme E+XX) Enrichissement ajout des hiérarchies administratives. rattachement des sites Outre-mer à la hiérarchie administrative. ajout du nom de l'opérateur pour les sites d'Outre-mer.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This French Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
The dataset contains 30 hours of dual-channel call center recordings between native French speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Rich metadata is available for each participant and conversation:
French Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(406 speakers, from French, Canada, and Africa etc.), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
French(France) Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering general category; human-machine interaction category. Transcribed with text content. Our dataset was collected from extensive and diversify speakers(1623 native speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.