17 datasets found
  1. Grocery_store

    • kaggle.com
    Updated Jan 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Guadalupe Martinez Vega (2023). Grocery_store [Dataset]. https://www.kaggle.com/datasets/martinezjosegpe/grocery-store/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jose Guadalupe Martinez Vega
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This is a raw data dataset from my family's grocery store in Mexico, the biggest file is the raw database with over 100 tables, the largest being over 5 million rows. Information in this dataset starts in 2014 when we installed the sales software. Latest data is Oct/2022, which is when I pulled the data to explore and practice with it. Includes sales, item description, Id's, dates, etc. With it you can do whatever you want, from weekly, monthly and yearly sales, to finding what's the most selling product during the weekend or on a Tuesday.

    The excel file is cleaned data that I pulled from the raw data, includes some charts and filtered and sorted information. Some tables and column names might be in Spanish, hopefully that is not a big problem for you to explore the data!

  2. m

    Mexico Geodemographic Information Dataset

    • app.mobito.io
    Updated Feb 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Mexico Geodemographic Information Dataset [Dataset]. https://app.mobito.io/data-product/mexico-geodemographic-information-dataset
    Explore at:
    Dataset updated
    Feb 23, 2023
    Area covered
    Mexico
    Description

    This dataset offers valuable insights into the demographic profile of a specific population, with data on factors such as age, income, and gender distribution, as well as number of homes and spending habits categorized into major expenditure categories such as food, transportation, and healthcare. The data is geocoded using geohash7 (152.9m x 152.4m), providing a more accurate representation of the population distribution. This information is a valuable resource for companies, researchers, and policymakers looking to gain a deeper understanding of the economic and social landscape of a community. Utilizing this data, they can make informed decisions related to resource allocation, planning, and policy development, and tailor initiatives to effectively address the challenges and opportunities facing the population. The dataset can be provided by country, state, municipality, colony, zone, polygon, etc.

  3. H

    Mexico, Evaluation of PROGRESA

    • dataverse.harvard.edu
    Updated Jul 9, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2012). Mexico, Evaluation of PROGRESA [Dataset]. http://doi.org/10.7910/DVN/05BMJY
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2012
    Dataset provided by
    Harvard Dataverse
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/05BMJYhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/05BMJY

    Area covered
    Mexico
    Description

    This dataset is available on the Oportunidades website at http://evaluacion.oportunidades.gob.mx:8010/en/index.php. Registration is required to access the data. For Registration, please visit http://evaluacion.oportunidades.gob.mx:8010/en/index.php and click Sign up on the right side of the page. You also may Email (evaluacion@oportunidades.g ob.mx) or Call (01-800-500-50-50) during regular business hours local (Mexico) time. NOTE: The 1999 quantitative evaluation of rural areas was done by the International Food Policy Research Institute (IFPRI) and the results of that evaluation were published in 2000 at http://evaluacion.oportunidades.gob.mx:8010/en/docs/eval_docs_2000.php. The later evaluations were transferred to the National Institute of Public Health (INSP), Social Anthropology Research and Higher Studies Center (CI ESAS), and Centro de Investigación y Docencia Económicas (CIDE). To view a particular evaluation document from 1999 and until 2007, select the year in the menu on the left at http://evaluacion.oportunidades.gob.mx:8010/en/index.php. The federal government of Mexico introduced the Programa de Educación, Salud y Alimentación (the Education, Health, and Nutrition Program), known by its Spanish acronym, PROGRESA to measure the impacts of the program on the covered population, using diverse methodological approaches. In 1999, IFPRI at the request of PROGRESA conducted a qualitative evalua tion of its major rural anti-poverty program. The experimental design used for the evaluation of PROGRESA takes advantage of the sequential expansion of the program in order to come up with a set of localities that serve the role of controls. Specifically, the sample consists of repeated observations (panel data) collected for 24,000 households from 506 localities in the seven states of Guerrero, Hidalgo, Michoacan, Puebla, Queretaro, San Luis Potosi and Veracruz. Of the 506 localities , 320 localities were assigned to the treatment group and 186 localities were assigned as controls. The 320 treatment localities were randomly selected using probabilities proportional to size from a universe of 4,546 localities that were covered by phase II of the program in the 7 states mentioned above. Using the same method, the 186 control localities were selected from a universe of 1,850 localities in these 7 states that were to be covered by PROGRESA in later phases. In November 1997 PROGRESA conducted a survey of the socio-economic conditions of rural Mexican households (Encuesta de Caracteristicas Socioeconomicas de los Hogares or ENCASEH) in the evaluation communities to determine which households would be eligible for benefits. Then based on PROGRESA’s beneficiary selection methods, households were classified as eligible and non-eligible for participation in the program in both treatment and control communities. The first evaluation survey (En cuesta Evaluation de los Hogares or ENCEL) took place in March 1998 before the initiation of benefits distribution in May 1998. In combination these two surveys provide the baseline observations available for all households before the initiation of the distribution of cash benefits in the treatment villages. The rest of the evaluation surveys were conducted after beneficiary households started receiving benefits from PROGRESA. One round of surveys took place in October/November 1998 (ENCEL98O), which was well after most households received some benefits as part of their participation in the program. The next two waves took place in June 1999 (ENCEL98M) and November 1999 (ENCEL99N). A number of core questions about the demographic composition of households and their socio-economic status w ere applied in each round of the survey. These core questions were accompanied by specific questionnaires, focused on collecting information critical to a thorough evaluation of the impact of the program. The topics of these modules included collecting information about family background, assets brought to marriage, schooling indicators, health status and utilization, parental attitudes and aspirations towards children’s schooling, consumption of food and non-food items, the allocation of time of household members in various activities, and self-employment activities. The preceding surveys were supplemented by school and clinic surveys, community questionnaires, data on student achievement test scores, and other school and clinic administrative data. The evaluation surveys (ENCEL) collected by PROGRESA did not allow for an evaluation of the nutritional component of the program. For the purposes of evaluating the nutritional component of PROGRESA separate surveys of the same families were carried out by the National Institute of Public Health (INSP) in Cuernavaca. These surveys included collection of data on anthropometric measures (weight and height) data of children, collection of blood samples for tests for anemia and other deficiencies.

  4. F

    Mexican Spanish Call Center Data for Telecom AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mexican Spanish Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Mexican Spanish Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Mexican Spanish speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.

    Participant Diversity:
    Speakers: 60 native Mexican Spanish speakers from our verified contributor pool.
    Regions: Representing multiple provinces across Mexico to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refund Requests and Billing Adjustments
    Emergency Service Access, and others
    Outbound Calls:
    Welcome Calls & Onboarding
    Payment Reminders
    Customer Satisfaction Surveys
    Technical Updates
    Service Usage Reviews
    Network Complaint Status Calls, and more

    This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, coughs)
    High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.

    These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  5. T

    Mexico Interest Rate

    • tradingeconomics.com
    • fr.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Mexico Interest Rate [Dataset]. https://tradingeconomics.com/mexico/interest-rate
    Explore at:
    excel, json, csv, xmlAvailable download formats
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 14, 2005 - Jun 26, 2025
    Area covered
    Mexico
    Description

    The benchmark interest rate in Mexico was last recorded at 8 percent. This dataset provides - Mexico Interest Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  6. F

    Mexican Spanish General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mexican Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Mexican Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Mexican Spanish communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Mexican accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Mexican Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Mexican Spanish speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Mexico to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Spanish speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Mexican Spanish.
    Voice Assistants: Build smart assistants capable of understanding natural Mexican conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  7. T

    Mexico Inflation Rate

    • tradingeconomics.com
    • fr.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Mexico Inflation Rate [Dataset]. https://tradingeconomics.com/mexico/inflation-cpi
    Explore at:
    xml, json, csv, excelAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1974 - Jun 30, 2025
    Area covered
    Mexico
    Description

    Inflation Rate in Mexico decreased to 4.32 percent in June from 4.42 percent in May of 2025. This dataset provides - Mexico Inflation Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  8. T

    Mexican Peso Data

    • tradingeconomics.com
    • tr.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS, Mexican Peso Data [Dataset]. https://tradingeconomics.com/mexico/currency
    Explore at:
    csv, excel, json, xmlAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 17, 1972 - Jul 24, 2025
    Area covered
    Mexico
    Description

    The USD/MXN exchange rate rose to 18.5643 on July 24, 2025, up 0.15% from the previous session. Over the past month, the Mexican Peso has strengthened 1.83%, but it's down by 0.64% over the last 12 months. Mexican Peso - values, historical data, forecasts and news - updated on July of 2025.

  9. e

    Promoting water consumption using behavioral economics insights [Dataset] -...

    • b2find.eudat.eu
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Feb 11, 2023
    Description

    Mexico has one of the largest overweight and obesity epidemics in the world and as a response, several actions aiming to reduce the obesity epidemic have been already set in place. Some of these actions include a specific action program for schools looking to turn the scholar environments into supportive environments for the infants to make healthier food choices. The influence of the environment (the so-called “choice architecture”) on people’s perceptions and decisions is studied by economists with the aim of supporting individuals’ to make healthier decisions, using tools known as “nudges”. However, "nudges" are not commonly integrated into anti-obesity strategies. We designed an intervention trying to find out whether such a small, liberty-preserving intervention could increase the effectiveness of a water-promotion campaign, when compared to the common approach of an educative talk. The intervention was developed in three schools in Mexico City and the State of Mexico. The body mass index, standardized by Z-scores, was used as the indicator of campaign success. Although – mainly due to problems within the sample and a yet too-short follow-up – our results do not show considerable differences between the approaches, they provide insights suggesting that including “nudges” into a health promoting campaign may indeed have a positive impact.

  10. F

    Mexican Spanish Wake Words & Voice Commands Speech Data

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mexican Spanish Wake Words & Voice Commands Speech Data [Dataset]. https://www.futurebeeai.com/dataset/wake-words-and-commands-dataset/wake-words-and-commands-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Mexican Spanish Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.

    Speech Data

    This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:

    Wake words alone
    Wake words followed by command phrases

    Participant Diversity

    Speakers: 50 native Mexican Spanish speakers from the FutureBeeAI community
    Regions: Participants from various Mexico provinces, ensuring broad coverage of accents and dialects
    Demographics: Ages 18–70; 60% male and 40% female participants

    Recording Details

    Type: Scripted wake words and command phrases
    Duration: 1 to 15 seconds per clip
    Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz

    Dataset Diversity

    Wake Word Types
    Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.
    Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.
    Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more

    Command Types by Use Case

    Automobile: Play music, check directions, voice search, provide feedback, and more
    Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more
    Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.

    Recording Environments

    No background noise
    Background traffic noise
    People talking in the background

    Speaking Pace

    Normal speed
    Fast speed

    This diversity ensures robust training for real-world voice assistant applications.

    Metadata

    Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.

    Participant Metadata: Unique ID, age, gender, region, accent, dialect
    Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format

    Use Cases & Applications

    Voice Assistant Activation: Train models to accurately detect and trigger based on wake words
    Smart Home Devices: Enable responsive voice control in smart appliances
    Automotive Voice Control: Power voice-based commands for navigation, entertainment, and system control

  11. F

    Mexican Spanish Call Center Data for BFSI AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mexican Spanish Call Center Data for BFSI AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/bfsi-call-center-conversation-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Mexican Spanish Call Center Speech Dataset for the BFSI (Banking, Financial Services, and Insurance) sector is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish-speaking customers. Featuring over 30 hours of real-world, unscripted audio, it offers authentic customer-agent interactions across a range of BFSI services to train robust and domain-aware ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI developers, financial technology teams, and NLP researchers to build high-accuracy, production-ready models across BFSI customer service scenarios.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Mexican Spanish speakers. Captured in realistic financial support settings, these conversations span diverse BFSI topics from loan enquiries and card disputes to insurance claims and investment options, providing deep contextual coverage for model training and evaluation.

    Participant Diversity:
    Speakers: 60 native Mexican Spanish speakers from our verified contributor pool.
    Regions: Representing multiple provinces across Mexico to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world BFSI voice coverage.

    Inbound Calls:
    Debit Card Block Request
    Transaction Disputes
    Loan Enquiries
    Credit Card Billing Issues
    Account Closure & Claims
    Policy Renewals & Cancellations
    Retirement & Tax Planning
    Investment Risk Queries, and more
    Outbound Calls:
    Loan & Credit Card Offers
    Customer Surveys
    EMI Reminders
    Policy Upgrades
    Insurance Follow-ups
    Investment Opportunity Calls
    Retirement Planning Reviews, and more

    This variety ensures models trained on the dataset are equipped to handle complex financial dialogues with contextual accuracy.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    30 hours-coded Segments
    Non-speech Tags (e.g., pauses, background noise)
    High transcription accuracy with word error rate < 5% due to double-layered quality checks.

    These transcriptions are production-ready, making financial domain model training faster and more accurate.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender,

  12. Z

    Survey on the Effects of COVID-19 on the Wellbeing of Mexico City Households...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Perez-Hernandez, Victor (2024). Survey on the Effects of COVID-19 on the Wellbeing of Mexico City Households (ENCOVID- 19 CDMX – DECEMBER 2020) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6972789
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    López Escobar, Emilio
    Perez-Hernandez, Victor
    Teruel Belismelis, Graciela
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mexico City, Mexico City, Mexico
    Description

    Amid the COVID-19 outbreak, the ENCOVID-19 CDMX provides information on the well-being of Mexico City households in four main domains: labor, income, mental health, and food insecurity. It offers timely information to understand the social consequences of the pandemic and the lockdown measures. It is a cross-sectional telephone survey that, in addition to the four main domains and a set of COVID19-related questions, includes key indicators to capture the impact of the pandemic on issues like education, social programs, and crime. This is the second dataset of the project, corresponding to December 2020, collected eight months after the lockdown began in Mexico. Data collection was performed from November 29 to December 10, 2020.

  13. F

    Mexican Spanish Scripted Monologue Speech Data for Telecom

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mexican Spanish Scripted Monologue Speech Data for Telecom [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/telecom-scripted-speech-monologues-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Presenting the Mexican Spanish Scripted Monologue Speech Dataset for the Telecom Domain, a purpose-built dataset created to accelerate the development of Spanish speech recognition and voice AI models specifically tailored for the telecommunications industry.

    Speech Data

    This dataset includes over 6,000 high-quality scripted prompt recordings in Mexican Spanish, representing real-world telecom customer service scenarios. It’s designed to support the training of speech-based AI systems used in call centers, virtual agents, and voice-powered support tools.

    Participant Diversity
    Speakers: 60 native Mexican Spanish speakers
    Geographic Distribution: Carefully selected from multiple regions across Mexico to capture a wide spectrum of dialects and speaking styles
    Demographics: Balanced representation of males and females (60:40 ratio), aged between 18 to 70 years
    Recording Specifications
    Type: Scripted monologue prompts focused on telecom industry use cases
    Duration: Each audio clip ranges from 5 to 30 seconds
    Format: WAV files in mono, 16-bit depth, with sample rates of 8 kHz and 16 kHz
    Environment: Clean, echo-free, and noise-controlled settings to ensure optimal audio clarity

    Topic Coverage

    The dataset reflects a wide variety of common telecom customer interactions, including:

    Customer onboarding and service inquiries
    Billing and payment questions
    Data plans and product information
    Technical support requests
    Network coverage discussions
    Regulatory compliance and policy information
    Upgrades, renewals, and service plan changes
    Domain-specific scripted interactions tailored to real-world telecom use cases

    Contextual Depth

    To maximize contextual richness, prompts include:

    Localized Names: Common Mexico names in various formats
    Addresses: Region-specific address structures for realism
    Dates & Times: Spoken date and time references in typical telecom scenarios (e.g., billing cycles, service activation times)
    Telecom Terminology: Keywords related to mobile data, network, SIM, devices, plans, etc.
    Numbers & Rates: Usage statistics, pricing info, recharge values, and billing figures
    Service Providers: References to telecom companies and third-party service entities

    Transcription

    Each audio file is paired with an accurate, verbatim transcription for precise model training:

    Content: Transcriptions are direct representations of each recorded prompt
    Format: Plain text (.TXT), with filenames matching their corresponding audio files
    Verification: Every transcription is manually verified by native Mexican Spanish linguists to ensure consistency and accuracy

    Metadata

    Detailed metadata is included to

  14. h

    Promoting water consumption using behavioral economics insights [Dataset]

    • heidata.uni-heidelberg.de
    bin, pdf
    Updated Apr 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salvador Camacho; Christiane Schwieren; Andreas Ruppel; Salvador Camacho; Christiane Schwieren; Andreas Ruppel (2017). Promoting water consumption using behavioral economics insights [Dataset] [Dataset]. http://doi.org/10.11588/DATA/10099
    Explore at:
    bin(71931), pdf(156287)Available download formats
    Dataset updated
    Apr 5, 2017
    Dataset provided by
    heiDATA
    Authors
    Salvador Camacho; Christiane Schwieren; Andreas Ruppel; Salvador Camacho; Christiane Schwieren; Andreas Ruppel
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/10099https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/10099

    Time period covered
    Feb 2016 - Jun 2016
    Area covered
    Mexico, Mexico city and Mexico State
    Description

    Mexico has one of the largest overweight and obesity epidemics in the world and as a response, several actions aiming to reduce the obesity epidemic have been already set in place. Some of these actions include a specific action program for schools looking to turn the scholar environments into supportive environments for the infants to make healthier food choices. The influence of the environment (the so-called “choice architecture”) on people’s perceptions and decisions is studied by economists with the aim of supporting individuals’ to make healthier decisions, using tools known as “nudges”. However, "nudges" are not commonly integrated into anti-obesity strategies. We designed an intervention trying to find out whether such a small, liberty-preserving intervention could increase the effectiveness of a water-promotion campaign, when compared to the common approach of an educative talk. The intervention was developed in three schools in Mexico City and the State of Mexico. The body mass index, standardized by Z-scores, was used as the indicator of campaign success. Although – mainly due to problems within the sample and a yet too-short follow-up – our results do not show considerable differences between the approaches, they provide insights suggesting that including “nudges” into a health promoting campaign may indeed have a positive impact.

  15. F

    Mexican Spanish Scripted Monologue Speech Data for Delivery & Logistics

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mexican Spanish Scripted Monologue Speech Data for Delivery & Logistics [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/delivery-scripted-speech-monologues-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Mexican Spanish Scripted Monologue Speech Dataset for the Delivery & Logistics Domain is a meticulously curated resource developed to support Spanish language speech recognition technologies, with a focus on real-world delivery and logistics applications.

    Speech Data

    This dataset includes 6,000+ high-quality scripted monologue recordings in Mexican Spanish, crafted to simulate practical scenarios in the delivery and logistics industry. These prompts are ideal for building robust, domain-specific conversational AI and customer support systems.

    Participant Diversity
    Speakers: 60 native Spanish speakers
    Regional Representation: Covers diverse dialects and accents from multiple regions of Mexico
    Demographics: Participants aged 18–70, with a 60:40 male-to-female ratio
    Recording Specifications
    Nature of Recordings: Scripted prompts and monologues
    Average Duration: 5–30 seconds per clip
    Format: WAV files, mono channel, 16-bit depth, 8 kHz and 16 kHz sample rates
    Environment: Noise-free, echo-free, quiet recording settings

    Topic & Scenario Coverage

    The dataset captures a wide variety of realistic delivery and logistics situations, including:

    Customer service dialogues
    Order processing and status inquiries
    Shipping, delivery, and tracking updates
    Returns, refunds, and complaint handling
    Technical assistance for delivery issues
    Regulatory questions and operational policies
    General advisory and domain-specific statements

    Linguistic Features

    To simulate authentic conversations, prompts include:

    Names: Regional male and female names in natural formats
    Addresses: Diverse location references including street names and regions
    Dates & Times: Common references for delivery slots, pickups, and ETA
    Order Numbers: Tracking IDs, invoice numbers, and order references
    Quantities & Weights: Units related to shipments and packaging
    Logistics Providers: Mentions of real or fictional courier and logistics services

    Transcription

    Each audio file is paired with a verbatim transcription, enhancing usability for training and validation:

    Content: Exact match of the audio prompt
    Format: Plain text (.TXT) with filenames aligned to audio files
    Quality Assurance: All transcripts are reviewed by native Spanish linguists for precision and consistency

    Metadata

    Comprehensive metadata accompanies every audio file and participant profile, supporting flexible filtering and model adaptation:

    Participant Metadata: Unique speaker ID, age, gender, region, and dialect

  16. F

    Real Estate Scripted Monologue Speech Data: Spanish (Mexico)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Real Estate Scripted Monologue Speech Data: Spanish (Mexico) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/realestate-scripted-speech-monologues-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Mexican Spanish Scripted Monologue Speech Dataset for the Real Estate Domain. This meticulously curated dataset is designed to advance the development of Spanish language speech recognition models, particularly for the Real Estate industry.

    Speech Data

    This training dataset comprises over 6,000 high-quality scripted prompt recordings in Mexican Spanish. These recordings cover various topics and scenarios relevant to the Real Estate domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 native Spanish speakers from different regions of Mexico.
    Regions: Ensures a balanced representation of Mexican Spanish accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Recording Nature: Audio recordings of scripted prompts/monologues.
    Audio Duration: Average duration of 5 to 30 seconds per recording.
    Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.
    Environment: Recordings are conducted in quiet settings without background noise and echo.
    Topic Diversity : The dataset encompasses a wide array of topics and conversational scenarios to ensure comprehensive coverage of the Real Estate sector. Topics include:
    Customer Inquiries
    Negotiations
    Financial Transactions
    Legal and Regulatory Issues
    Relocation Services
    Agent Services
    Domain Specific Statement
    Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in Real Estate interactions:
    Names: Region-specific names of males and females in various formats.
    Addresses: Region-specific addresses in different spoken formats, including street names, neighborhoods, and cities.
    Dates & Times: Inclusion of date and time in various real estate contexts, such as viewing appointments and move-in dates.
    Property Details: Specific details about properties, including sizes, features, and amenities.
    Financial Figures: Various amounts related to property prices, rents, deposits, and mortgage rates.
    Legal Terms: Common legal and contractual terms used in real estate transactions.

    Each scripted prompt is crafted to reflect real-life scenarios encountered in the Real Estate domain, ensuring applicability in training robust natural language processing and speech recognition models.

    Transcription Data

    In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.

    Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.
    Format: Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.

  17. f

    Table_1_An Analysis of Current Sustainability of Mexican Cities and Their...

    • frontiersin.figshare.com
    docx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Estrada; Julián A. Velasco; Amparo Martinez-Arroyo; Oscar Calderón-Bustamante (2023). Table_1_An Analysis of Current Sustainability of Mexican Cities and Their Exposure to Climate Change.DOCX [Dataset]. http://doi.org/10.3389/fenvs.2020.00025.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Francisco Estrada; Julián A. Velasco; Amparo Martinez-Arroyo; Oscar Calderón-Bustamante
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mexico
    Description

    The increasing demand for goods and services in cities around the world due to a rapidly growing urban population is pushing the socioecological systems that support them to their limits. The complexity of urban socioeconomic and environmental systems and their interactions generate a challenging multidimensional decision problem. In response, governments around the world are currently generating a variety of measurements that aim to portrait the main factors that are related to the level of sustainability that a city shows. While the objective of these efforts is to help in the process of urban policy making, these measures are often hard to interpret and do not lend to discover underlying characteristics that may be common among a group of cities. Moreover, these measures are typically focused on describing the current state and omit future challenges such as climate change, which may significantly affect any evaluation of urban sustainability. Recently, the Institute of Ecology and Climate Change (INECC) of Mexico produced a dataset of 36 sustainability related variables for over 100 cities that has the objective of helping federal and state level governments defining sustainable urban strategies. Here we use multivariate statistical techniques to (1) decrease the dimensionality of the dataset and find indices that could be more useful to decision makers; (2) find commonalities among cities include in the dataset in order to help in designing urban strategies for cities with similar characteristics; (3) cities are ranked in terms of their sustainability and characteristics and; (4) the sustainability ranking is compared to estimates of how much the current climate in each of these cities is expected to change during this century, which would add further challenges to maintain or improve urban sustainability.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jose Guadalupe Martinez Vega (2023). Grocery_store [Dataset]. https://www.kaggle.com/datasets/martinezjosegpe/grocery-store/code
Organization logo

Grocery_store

Raw data from a grocery store in Mexico, over 100 tables and data from 2014-2022

Explore at:
102 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jose Guadalupe Martinez Vega
License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

This is a raw data dataset from my family's grocery store in Mexico, the biggest file is the raw database with over 100 tables, the largest being over 5 million rows. Information in this dataset starts in 2014 when we installed the sales software. Latest data is Oct/2022, which is when I pulled the data to explore and practice with it. Includes sales, item description, Id's, dates, etc. With it you can do whatever you want, from weekly, monthly and yearly sales, to finding what's the most selling product during the weekend or on a Tuesday.

The excel file is cleaned data that I pulled from the raw data, includes some charts and filtered and sorted information. Some tables and column names might be in Spanish, hopefully that is not a big problem for you to explore the data!

Search
Clear search
Close search
Google apps
Main menu