17 datasets found

769 Hours - French Speech Data by Mobile Phone
m.nexdata.ai
Updated Oct 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). 769 Hours - French Speech Data by Mobile Phone [Dataset]. https://m.nexdata.ai/datasets/speechrecog/952
Explore at:
Dataset updated
Oct 22, 2023
Dataset authored and provided by
Nexdata
Area covered
French
Variables measured
Device, Format, Country, Speaker, Language, Accuracy rate, Content category, Recording device, Recording condition, Language(Region) Code, and 1 more
Description
French(France) Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering general category; human-machine interaction category. Transcribed with text content. Our dataset was collected from extensive and diversify speakers(1623 native speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
80 Hours - French(Canada) Spontaneous Dialogue Smartphone speech dataset
m.nexdata.ai
nexdata.ai
Updated Jun 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). 80 Hours - French(Canada) Spontaneous Dialogue Smartphone speech dataset [Dataset]. https://m.nexdata.ai/datasets/speechrecog/1302?source=Kaggle
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Nexdata
Area covered
Canada, French
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
Description
French(Canada) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(126 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Sites mobiles 5G
kaggle.com
Updated Mar 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathurin Aché (2021). Sites mobiles 5G [Dataset]. https://www.kaggle.com/mathurinache/sites-mobiles-5g/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 16, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mathurin Aché
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Ce jeu de données est issu d'une fusion-hybridation des jeux de données Arcep et ANFR sur les sites 5G, ces deux organisations présentant chacune des informations partielles et ne synchronisant pas leurs publications https://www.data.gouv.fr/fr/datasets/fichier-complet-des-sites-mobiles-5g/
p
Mobile Phone Repair Shops in France - 4,598 Available (Free Sample)
poidata.io
csv
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poidata.io (2025). Mobile Phone Repair Shops in France - 4,598 Available (Free Sample) [Dataset]. https://www.poidata.io/report/mobile-phone-repair-shop/france
Explore at:
csvAvailable download formats
Dataset updated
Jun 4, 2025
Dataset provided by
Poidata.io
Area covered
France
Description
This dataset provides information on 4,598 in France as of June, 2025. It includes details such as email addresses (where publicly available), phone numbers (where publicly available), and geocoded addresses. Explore market trends, identify potential business partners, and gain valuable insights into the industry. Download a complimentary sample of 10 records to see what's included.
F
French Newspaper, Magazine, and Books OCR Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). French Newspaper, Magazine, and Books OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/french-newspaper-book-magazine-ocr-image-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
French
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the French Newspaper, Books, and Magazine Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the French language.
Dataset Contain & Diversity:
Containing a total of 5000 images, this French OCR dataset offers an equal distribution across newspapers, books, and magazines. Within, you'll find a diverse collection of content, including articles, advertisements, cover pages, headlines, call outs, and author sections from a variety of newspapers, books, and magazines. Images in this dataset showcases distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personal identifiable information (PII), and in each image a minimum of 80% space is contain visible French text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, further enhancing dataset diversity. The collection features images in portrait and landscape modes.
All these images were captured by native French people to ensure the text quality, avoid toxic content and PII text. We used latest iOS and android mobile devices above 5MP camera to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data you will also receive detailed structured metadata in CSV format. For each image it includes metadata like device information, source type like newspaper, magazine or book image, and image type like portrait or landscape etc. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of French text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native French crowd community.
If you require a custom dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this image dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the French language. Your journey to enhanced language understanding and processing starts here.
d
Veraset Movement | Europe | GPS Mobile Location Data | Reliable, Compliant,...
datarade.ai
.csv
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Veraset (2022). Veraset Movement | Europe | GPS Mobile Location Data | Reliable, Compliant, Precise Location Data [Dataset]. https://datarade.ai/data-products/veraset-movement-europe-gps-mobile-location-data-reli-veraset
Explore at:
.csvAvailable download formats
Dataset updated
May 31, 2022
Dataset authored and provided by
Veraset
Area covered
Germany, Luxembourg, Hungary, Lithuania, Bulgaria, Estonia, Spain, Finland, Belgium, Italy
Description
Leverage the most reliable and compliant mobile device location/foot traffic dataset on the market.

Veraset Movement (Mobile Location Data) offers unparalleled insights into footfall traffic patterns across dozens of European countries.

Covering 45+ European countries, Veraset's Mobile Location Data draws on raw GPS data from tier-1 apps, SDKs, and aggregators of mobile devices to provide customers with accurate, up-to-the-minute information on human movement. Ideal for ad tech, planning, retail, and transportation logistics, Veraset's Movement data helps shape strategy and make impactful data-driven decisions.

Veraset’s European Movement Panel includes the following countries: - United Kingdom-GB - Germany-DE - France-FR - Spain-ES - Italy-IT - The Netherlands-NL - Switzerland-CH - Belgium-BE - Sweden-SE - Austria-AT - Denmark-DK - Finland-FI - Cyprus-CY - Poland-PL - Ireland-IE - Portugal-PT - Romania-RO - Hungary-HU - Czech Republic-CZ - Greece-GR - Bulgaria-BG - Lithuania-LT - Croatia-HR - Norway-NO - Latvia-LV - Luxembourg-LU - Slovakia-SK - Estonia-EE - Cayman Islands-KY - Slovenia-SI - Vatican city-VA - Turks and Caicos Islands-TC - Bermuda-BM - Malta-MT - Iceland-IS - Liechtenstein-LI - Monaco-MC - British Virgin Islands-VG - Anguilla-AI - Andorra-AD - Greenland-GL - San Marino-SM - Federated States of Micronesia-FM - Montserrat-MS - Pitcairn islands-PN

Common Use Cases of Veraset's Mobile Location Data: - Advertising - Ad Placement, Attribution, and Segmentation - Audience Creation/Building - Dynamic Ad Targeting - Infrastructure Plans - Route Optimization - Public Transit Optimization - Credit Card Loyalty - Competitive Analysis - Risk assessment, Underwriting, and Policy Personalization - Enrichment of Existing Datasets - Trade Area Analysis - Predictive Analytics and Trend Forecasting
p
Cell Phone Accessory Stores in France - 3,931 Available (Free Sample)
poidata.io
csv
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poidata.io (2025). Cell Phone Accessory Stores in France - 3,931 Available (Free Sample) [Dataset]. https://www.poidata.io/report/cell-phone-accessory-store/france
Explore at:
csvAvailable download formats
Dataset updated
May 30, 2025
Dataset provided by
Poidata.io
Area covered
France
Description
This dataset provides information on 3,931 in France as of May, 2025. It includes details such as email addresses (where publicly available), phone numbers (where publicly available), and geocoded addresses. Explore market trends, identify potential business partners, and gain valuable insights into the industry. Download a complimentary sample of 10 records to see what's included.
o
Observatoire des ondes - Relais mobiles 5G activés - Tours Métropole Val de...
toursmetropole.opendatasoft.com
data.tours-metropole.fr
csv, excel, geojson +1
Updated Feb 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Observatoire des ondes - Relais mobiles 5G activés - Tours Métropole Val de Loire [Dataset]. https://toursmetropole.opendatasoft.com/explore/dataset/sites-mobiles-5g-france/api/
Explore at:
excel, json, csv, geojsonAvailable download formats
Dataset updated
Feb 5, 2024
License
Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
License information was derived automatically
Area covered
Tours, Centre-Val de Loire
Description
Ce jeu de données recense et localise la dernière version des antennes mobiles 5G des différents opérateurs et les bandes de fréquences disponibles sur ces sites en France Métropolitaine. Enrichissementajout des hiérarchies administratives.

Data from: Common Phone: A Multilingual Dataset for Robust Acoustic...

zenodo.org
explore.openaire.eu

application/gzip

Updated Jul 17, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Philipp Klumpp; Philipp Klumpp; Tomás Arias-Vergara; Paula Andrea Pérez-Toro; Elmar Nöth; Juan Rafael Orozco-Arroyave; Tomás Arias-Vergara; Paula Andrea Pérez-Toro; Elmar Nöth; Juan Rafael Orozco-Arroyave (2024). Common Phone: A Multilingual Dataset for Robust Acoustic Modelling [Dataset]. http://doi.org/10.5281/zenodo.5846137

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5846137

Dataset updated

Jul 17, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Release Date: 17.01.22

Welcome to Common Phone 1.0

Legal Information

Common Phone is a subset of the Common Voice corpus collected by Mozilla Corporation. By using Common Phone, you agree to the Common Voice Legal Terms. Common Phone is maintained and distributed by speech researchers at the Pattern Recognition Lab of Friedrich-Alexander-University Erlangen-Nuremberg (FAU) under the CC0 license.

Like for Common Voice, you must not make any attempt to identify speakers that contributed to Common Phone.

About Common Phone

This corpus aims to provide a basis for Machine Learning (ML) researchers and enthusiasts to train and test their models against a wide variety of speakers, hardware/software ecosystems and acoustic conditions to improve generalization and availability of ML in real-world speech applications.
The current version of Common Phone comprises 116,5 hours of speech samples, collected from 11.246 speakers in 6 languages:

Language	Speakers	Hours
	`train` / `dev` / `test`	`train` / `dev` / `test`
English	4716 / 771 / 774	14.1 / 2.3 / 2.3
French	796 / 138 / 135	13.6 / 2.3 / 2.2
German	1176 / 202 / 206	14.5 / 2.5 / 2.6
Italian	1031 / 176 / 178	14.6 / 2.5 / 2.5
Spanish	508 / 88 / 91	16.5 / 3.0 / 3.1
Russian	190 / 34 / 36	12.7 / 2.6 / 2.8
Total	8417 / 1409 / 1420	85.8 / 15.2 / 15.5

Presented train, dev and test splits are not identical to those shipped with Common Voice. Speaker separation among splits was realized by only using those speakers that had provided age and gender information. This information can only be provided as a registered user on the website. When logged in, the session ID of contributed recordings is always linked to your user, thus we could easily link recordings to individual speakers. Keep in mind this would not be possible for unregistered users, as their session ID changes if they decide to contribute more than once.
During speaker selection, we considered that some speakers had contributed to more than one of the six Common Voice datasets (one for each language). In Common Phone, a speaker will only appear in one language.
The dataset is structured as follows:

Six top-level directories, one for each language.
Each language folder contains:
- [train|dev|test].csv files listing audio files, respective speaker ID and plain text transcript.
- meta.csv provides speaker information: age group, gender, language, accent (if available) and which of the three splits this speaker was assigned to. File names match corresponding audio file names except their extension.
- /grids/ contains phonetic transcription for every audio file in Praat TextGrid format.
- /mp3/ contains audio files in mp3, identical to those of Common Voice, e.g., sampling rates have been preserved and may vary for different files.
- /wav/ contains raw audio files in 16 bits/sample, 16 kHz single channel. They had been created from the original mp3 audios. We provide them for convenience, keep in mind that their source had undergone MP3-compression.

Where does the phonetic annotation come from?

Phonetic annotation was computed via BAS Web Services. We used the regular Pipeline (G2P-MAUS) without ASR to create an alignment of text transcripts with audio signals. We chose International Phonetic Alphabet (IPA) output symbols as they work well even in a multi-lingual setup. Common Phone annotation comprises 101 phonetic symbols, including silence.

Why Common Phone?

Large number of speakers and varying acoustic conditions to improve robustness of ML models
Time-aligned IPA phonetic transcription for every audio sample
Gender-balanced and age-group-matched (equal number of female/male speakers in every age group)
Support for six different languages to leverage multi-lingual approaches
Original MP3 files plus standard WAVE files

Is there any publication available?

Yes, a paper describing Common Phone in detail is currently under revision for LREC 2022. You can access a pre-print version on arXiv entitled “Common Phone: A Multilingual Dataset for Robust Acoustic Modelling”.

F
French Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). French Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/french-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
French
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the French Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the French language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this French OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible French text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native French people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of French text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native French crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the French language. Your journey to enhanced language understanding and processing starts here.
C
GPS Raw data France
ckan.mobidatalab.eu
csv
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Singlespot (2023). GPS Raw data France [Dataset]. https://ckan.mobidatalab.eu/dataset/gps_raw_data_france
Explore at:
csvAvailable download formats
Dataset updated
Nov 16, 2023
Dataset provided by
Singlespot
Area covered
France
Description
This dataset contains GPS tracks of mobile phones users.
520 Hours - French Speaking English Speech Data by Mobile Phone
m.nexdata.ai
nexdata.ai
Updated May 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). 520 Hours - French Speaking English Speech Data by Mobile Phone [Dataset]. https://m.nexdata.ai/datasets/speechrecog/989?source=Kaggle
Explore at:
Dataset updated
May 5, 2025
Dataset authored and provided by
Nexdata
Area covered
French
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description
English(France) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,089 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
8
Sites mobiles 5G - France
data.82amenagement.fr
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Apr 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Sites mobiles 5G - France [Dataset]. https://data.82amenagement.fr/explore/dataset/sites-mobiles-5g-france/
Explore at:
geojson, csv, excel, jsonAvailable download formats
Dataset updated
Apr 27, 2025
License
Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
License information was derived automatically
Area covered
France
Description
Ce jeu de données recense et localise la dernière version des antennes mobiles 5G des différents opérateurs et les bandes de fréquences disponibles sur ces sites en France Métropolitaine. Enrichissementajout des hiérarchies administratives.
Mobile Phones Market Size Volume in France, 2023
reportlinker.com
Updated Apr 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ReportLinker (2024). Mobile Phones Market Size Volume in France, 2023 [Dataset]. https://www.reportlinker.com/dataset/1203bca66faa59ac78eadf8a3f1059e22b9b4738
Explore at:
Dataset updated
Apr 5, 2024
Dataset authored and provided by
ReportLinker
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
France
Description
Mobile Phones Market Size Volume in France, 2023 Discover more data with ReportLinker!
s
Sites mobiles 2G, 3G, 4G - France
data.smartidf.services
ods.backoffice.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Sites mobiles 2G, 3G, 4G - France [Dataset]. https://data.smartidf.services/explore/dataset/buildingref-france-arcep-mobile-site-2g3g4g/
Explore at:
csv, json, excel, geojsonAvailable download formats
Dataset updated
Mar 27, 2025
License
Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
License information was derived automatically
Area covered
France
Description
Ce jeu de données recense et localise la dernière version des antennes mobiles des quatre opérateurs et la disponibilité des différentes technologies (2G, 3G, 4G) sur ces sites en France Métropolitaine et Outre-Mer. Attention, des erreurs ont été détectés sur les identifiants des sites (id sous forme E+XX) Enrichissement ajout des hiérarchies administratives. rattachement des sites Outre-mer à la hiérarchie administrative. ajout du nom de l'opérateur pour les sites d'Outre-mer.
F
French Call Center Data for Telecom AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). French Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-french-france
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
French
Dataset funded by
FutureBeeAI
Description
Introduction
This French Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for French-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native French speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
•Participant Diversity:
•
Speakers: 60 native French speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across France to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
•Inbound Calls:
•Phone Number Porting
•Network Connectivity Issues
•Billing and Payments
•Technical Support
•Service Activation
•International Roaming Enquiry
•Refund Requests and Billing Adjustments
•Emergency Service Access, and others
•Outbound Calls:
•Welcome Calls & Onboarding
•Payment Reminders
•Customer Satisfaction Surveys
•Technical Updates
•Service Usage Reviews
•Network Complaint Status Calls, and more
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., pauses, coughs)
•High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.
n
231.9 Hours - French Scripted Monologue Smartphone speech dataset
m.nexdata.ai
Updated Apr 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 231.9 Hours - French Scripted Monologue Smartphone speech dataset [Dataset]. https://m.nexdata.ai/datasets/speechrecog/114
Explore at:
Dataset updated
Apr 12, 2024
Dataset provided by
nexdata technology inc
Authors
Nexdata
Area covered
French
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description
French Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(406 speakers, from French, Canada, and Africa etc.), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nexdata (2023). 769 Hours - French Speech Data by Mobile Phone [Dataset]. https://m.nexdata.ai/datasets/speechrecog/952

769 Hours - French Speech Data by Mobile Phone

Explore at:

Dataset updated

Oct 22, 2023

Dataset authored and provided by

Nexdata

Area covered

French

Variables measured

Device, Format, Country, Speaker, Language, Accuracy rate, Content category, Recording device, Recording condition, Language(Region) Code, and 1 more

Description

French(France) Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering general category; human-machine interaction category. Transcribed with text content. Our dataset was collected from extensive and diversify speakers(1623 native speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Clear search

Close search

Google apps

Main menu

769 Hours - French Speech Data by Mobile Phone

80 Hours - French(Canada) Spontaneous Dialogue Smartphone speech dataset

Sites mobiles 5G

Mobile Phone Repair Shops in France - 4,598 Available (Free Sample)

French Newspaper, Magazine, and Books OCR Image Dataset

What’s Included

Veraset Movement | Europe | GPS Mobile Location Data | Reliable, Compliant,...

Cell Phone Accessory Stores in France - 3,931 Available (Free Sample)

Observatoire des ondes - Relais mobiles 5G activés - Tours Métropole Val de...

Data from: Common Phone: A Multilingual Dataset for Robust Acoustic...

French Product Image OCR Dataset

What’s Included

GPS Raw data France

520 Hours - French Speaking English Speech Data by Mobile Phone

Sites mobiles 5G - France

Mobile Phones Market Size Volume in France, 2023

Sites mobiles 2G, 3G, 4G - France

French Call Center Data for Telecom AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

231.9 Hours - French Scripted Monologue Smartphone speech dataset

769 Hours - French Speech Data by Mobile Phone