Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This is a raw data dataset from my family's grocery store in Mexico, the biggest file is the raw database with over 100 tables, the largest being over 5 million rows. Information in this dataset starts in 2014 when we installed the sales software. Latest data is Oct/2022, which is when I pulled the data to explore and practice with it. Includes sales, item description, Id's, dates, etc. With it you can do whatever you want, from weekly, monthly and yearly sales, to finding what's the most selling product during the weekend or on a Tuesday.
The excel file is cleaned data that I pulled from the raw data, includes some charts and filtered and sorted information. Some tables and column names might be in Spanish, hopefully that is not a big problem for you to explore the data!
This dataset offers valuable insights into the demographic profile of a specific population, with data on factors such as age, income, and gender distribution, as well as number of homes and spending habits categorized into major expenditure categories such as food, transportation, and healthcare. The data is geocoded using geohash7 (152.9m x 152.4m), providing a more accurate representation of the population distribution. This information is a valuable resource for companies, researchers, and policymakers looking to gain a deeper understanding of the economic and social landscape of a community. Utilizing this data, they can make informed decisions related to resource allocation, planning, and policy development, and tailor initiatives to effectively address the challenges and opportunities facing the population. The dataset can be provided by country, state, municipality, colony, zone, polygon, etc.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/05BMJYhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/05BMJY
This dataset is available on the Oportunidades website at http://evaluacion.oportunidades.gob.mx:8010/en/index.php. Registration is required to access the data. For Registration, please visit http://evaluacion.oportunidades.gob.mx:8010/en/index.php and click Sign up on the right side of the page. You also may Email (evaluacion@oportunidades.g ob.mx) or Call (01-800-500-50-50) during regular business hours local (Mexico) time. NOTE: The 1999 quantitative evaluation of rural areas was done by the International Food Policy Research Institute (IFPRI) and the results of that evaluation were published in 2000 at http://evaluacion.oportunidades.gob.mx:8010/en/docs/eval_docs_2000.php. The later evaluations were transferred to the National Institute of Public Health (INSP), Social Anthropology Research and Higher Studies Center (CI ESAS), and Centro de Investigación y Docencia Económicas (CIDE). To view a particular evaluation document from 1999 and until 2007, select the year in the menu on the left at http://evaluacion.oportunidades.gob.mx:8010/en/index.php. The federal government of Mexico introduced the Programa de Educación, Salud y Alimentación (the Education, Health, and Nutrition Program), known by its Spanish acronym, PROGRESA to measure the impacts of the program on the covered population, using diverse methodological approaches. In 1999, IFPRI at the request of PROGRESA conducted a qualitative evalua tion of its major rural anti-poverty program. The experimental design used for the evaluation of PROGRESA takes advantage of the sequential expansion of the program in order to come up with a set of localities that serve the role of controls. Specifically, the sample consists of repeated observations (panel data) collected for 24,000 households from 506 localities in the seven states of Guerrero, Hidalgo, Michoacan, Puebla, Queretaro, San Luis Potosi and Veracruz. Of the 506 localities , 320 localities were assigned to the treatment group and 186 localities were assigned as controls. The 320 treatment localities were randomly selected using probabilities proportional to size from a universe of 4,546 localities that were covered by phase II of the program in the 7 states mentioned above. Using the same method, the 186 control localities were selected from a universe of 1,850 localities in these 7 states that were to be covered by PROGRESA in later phases. In November 1997 PROGRESA conducted a survey of the socio-economic conditions of rural Mexican households (Encuesta de Caracteristicas Socioeconomicas de los Hogares or ENCASEH) in the evaluation communities to determine which households would be eligible for benefits. Then based on PROGRESA’s beneficiary selection methods, households were classified as eligible and non-eligible for participation in the program in both treatment and control communities. The first evaluation survey (En cuesta Evaluation de los Hogares or ENCEL) took place in March 1998 before the initiation of benefits distribution in May 1998. In combination these two surveys provide the baseline observations available for all households before the initiation of the distribution of cash benefits in the treatment villages. The rest of the evaluation surveys were conducted after beneficiary households started receiving benefits from PROGRESA. One round of surveys took place in October/November 1998 (ENCEL98O), which was well after most households received some benefits as part of their participation in the program. The next two waves took place in June 1999 (ENCEL98M) and November 1999 (ENCEL99N). A number of core questions about the demographic composition of households and their socio-economic status w ere applied in each round of the survey. These core questions were accompanied by specific questionnaires, focused on collecting information critical to a thorough evaluation of the impact of the program. The topics of these modules included collecting information about family background, assets brought to marriage, schooling indicators, health status and utilization, parental attitudes and aspirations towards children’s schooling, consumption of food and non-food items, the allocation of time of household members in various activities, and self-employment activities. The preceding surveys were supplemented by school and clinic surveys, community questionnaires, data on student achievement test scores, and other school and clinic administrative data. The evaluation surveys (ENCEL) collected by PROGRESA did not allow for an evaluation of the nutritional component of the program. For the purposes of evaluating the nutritional component of PROGRESA separate surveys of the same families were carried out by the National Institute of Public Health (INSP) in Cuernavaca. These surveys included collection of data on anthropometric measures (weight and height) data of children, collection of blood samples for tests for anemia and other deficiencies.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Mexican Spanish Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
The dataset contains 30 hours of dual-channel call center recordings between native Mexican Spanish speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Rich metadata is available for each participant and conversation:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in Mexico was last recorded at 8 percent. This dataset provides - Mexico Interest Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Mexican Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Mexican Spanish communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Mexican accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Mexican Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple Spanish speech and language AI applications:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Inflation Rate in Mexico decreased to 4.32 percent in June from 4.42 percent in May of 2025. This dataset provides - Mexico Inflation Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The USD/MXN exchange rate rose to 18.5643 on July 24, 2025, up 0.15% from the previous session. Over the past month, the Mexican Peso has strengthened 1.83%, but it's down by 0.64% over the last 12 months. Mexican Peso - values, historical data, forecasts and news - updated on July of 2025.
Mexico has one of the largest overweight and obesity epidemics in the world and as a response, several actions aiming to reduce the obesity epidemic have been already set in place. Some of these actions include a specific action program for schools looking to turn the scholar environments into supportive environments for the infants to make healthier food choices. The influence of the environment (the so-called “choice architecture”) on people’s perceptions and decisions is studied by economists with the aim of supporting individuals’ to make healthier decisions, using tools known as “nudges”. However, "nudges" are not commonly integrated into anti-obesity strategies. We designed an intervention trying to find out whether such a small, liberty-preserving intervention could increase the effectiveness of a water-promotion campaign, when compared to the common approach of an educative talk. The intervention was developed in three schools in Mexico City and the State of Mexico. The body mass index, standardized by Z-scores, was used as the indicator of campaign success. Although – mainly due to problems within the sample and a yet too-short follow-up – our results do not show considerable differences between the approaches, they provide insights suggesting that including “nudges” into a health promoting campaign may indeed have a positive impact.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Mexican Spanish Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.
This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:
This diversity ensures robust training for real-world voice assistant applications.
Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Mexican Spanish Call Center Speech Dataset for the BFSI (Banking, Financial Services, and Insurance) sector is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish-speaking customers. Featuring over 30 hours of real-world, unscripted audio, it offers authentic customer-agent interactions across a range of BFSI services to train robust and domain-aware ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, financial technology teams, and NLP researchers to build high-accuracy, production-ready models across BFSI customer service scenarios.
The dataset contains 30 hours of dual-channel call center recordings between native Mexican Spanish speakers. Captured in realistic financial support settings, these conversations span diverse BFSI topics from loan enquiries and card disputes to insurance claims and investment options, providing deep contextual coverage for model training and evaluation.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world BFSI voice coverage.
This variety ensures models trained on the dataset are equipped to handle complex financial dialogues with contextual accuracy.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, making financial domain model training faster and more accurate.
Rich metadata is available for each participant and conversation:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Amid the COVID-19 outbreak, the ENCOVID-19 CDMX provides information on the well-being of Mexico City households in four main domains: labor, income, mental health, and food insecurity. It offers timely information to understand the social consequences of the pandemic and the lockdown measures. It is a cross-sectional telephone survey that, in addition to the four main domains and a set of COVID19-related questions, includes key indicators to capture the impact of the pandemic on issues like education, social programs, and crime. This is the second dataset of the project, corresponding to December 2020, collected eight months after the lockdown began in Mexico. Data collection was performed from November 29 to December 10, 2020.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Presenting the Mexican Spanish Scripted Monologue Speech Dataset for the Telecom Domain, a purpose-built dataset created to accelerate the development of Spanish speech recognition and voice AI models specifically tailored for the telecommunications industry.
This dataset includes over 6,000 high-quality scripted prompt recordings in Mexican Spanish, representing real-world telecom customer service scenarios. It’s designed to support the training of speech-based AI systems used in call centers, virtual agents, and voice-powered support tools.
The dataset reflects a wide variety of common telecom customer interactions, including:
To maximize contextual richness, prompts include:
Each audio file is paired with an accurate, verbatim transcription for precise model training:
Detailed metadata is included to
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/10099https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/10099
Mexico has one of the largest overweight and obesity epidemics in the world and as a response, several actions aiming to reduce the obesity epidemic have been already set in place. Some of these actions include a specific action program for schools looking to turn the scholar environments into supportive environments for the infants to make healthier food choices. The influence of the environment (the so-called “choice architecture”) on people’s perceptions and decisions is studied by economists with the aim of supporting individuals’ to make healthier decisions, using tools known as “nudges”. However, "nudges" are not commonly integrated into anti-obesity strategies. We designed an intervention trying to find out whether such a small, liberty-preserving intervention could increase the effectiveness of a water-promotion campaign, when compared to the common approach of an educative talk. The intervention was developed in three schools in Mexico City and the State of Mexico. The body mass index, standardized by Z-scores, was used as the indicator of campaign success. Although – mainly due to problems within the sample and a yet too-short follow-up – our results do not show considerable differences between the approaches, they provide insights suggesting that including “nudges” into a health promoting campaign may indeed have a positive impact.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Mexican Spanish Scripted Monologue Speech Dataset for the Delivery & Logistics Domain is a meticulously curated resource developed to support Spanish language speech recognition technologies, with a focus on real-world delivery and logistics applications.
This dataset includes 6,000+ high-quality scripted monologue recordings in Mexican Spanish, crafted to simulate practical scenarios in the delivery and logistics industry. These prompts are ideal for building robust, domain-specific conversational AI and customer support systems.
The dataset captures a wide variety of realistic delivery and logistics situations, including:
To simulate authentic conversations, prompts include:
Each audio file is paired with a verbatim transcription, enhancing usability for training and validation:
Comprehensive metadata accompanies every audio file and participant profile, supporting flexible filtering and model adaptation:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Mexican Spanish Scripted Monologue Speech Dataset for the Real Estate Domain. This meticulously curated dataset is designed to advance the development of Spanish language speech recognition models, particularly for the Real Estate industry.
This training dataset comprises over 6,000 high-quality scripted prompt recordings in Mexican Spanish. These recordings cover various topics and scenarios relevant to the Real Estate domain, designed to build robust and accurate customer service speech technology.
Each scripted prompt is crafted to reflect real-life scenarios encountered in the Real Estate domain, ensuring applicability in training robust natural language processing and speech recognition models.
In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The increasing demand for goods and services in cities around the world due to a rapidly growing urban population is pushing the socioecological systems that support them to their limits. The complexity of urban socioeconomic and environmental systems and their interactions generate a challenging multidimensional decision problem. In response, governments around the world are currently generating a variety of measurements that aim to portrait the main factors that are related to the level of sustainability that a city shows. While the objective of these efforts is to help in the process of urban policy making, these measures are often hard to interpret and do not lend to discover underlying characteristics that may be common among a group of cities. Moreover, these measures are typically focused on describing the current state and omit future challenges such as climate change, which may significantly affect any evaluation of urban sustainability. Recently, the Institute of Ecology and Climate Change (INECC) of Mexico produced a dataset of 36 sustainability related variables for over 100 cities that has the objective of helping federal and state level governments defining sustainable urban strategies. Here we use multivariate statistical techniques to (1) decrease the dimensionality of the dataset and find indices that could be more useful to decision makers; (2) find commonalities among cities include in the dataset in order to help in designing urban strategies for cities with similar characteristics; (3) cities are ranked in terms of their sustainability and characteristics and; (4) the sustainability ranking is compared to estimates of how much the current climate in each of these cities is expected to change during this century, which would add further challenges to maintain or improve urban sustainability.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This is a raw data dataset from my family's grocery store in Mexico, the biggest file is the raw database with over 100 tables, the largest being over 5 million rows. Information in this dataset starts in 2014 when we installed the sales software. Latest data is Oct/2022, which is when I pulled the data to explore and practice with it. Includes sales, item description, Id's, dates, etc. With it you can do whatever you want, from weekly, monthly and yearly sales, to finding what's the most selling product during the weekend or on a Tuesday.
The excel file is cleaned data that I pulled from the raw data, includes some charts and filtered and sorted information. Some tables and column names might be in Spanish, hopefully that is not a big problem for you to explore the data!