Our Spanish language datasets are carefully compiled and annotated by language and linguistic experts; you can find them available for licensing:
Key Features (approximate numbers):
Our Spanish monolingual reliably offers clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Spanish language.
The bilingual data provides translations in both directions, from English to Spanish and from Spanish to English. It is annually reviewed and updated by our in-house team of language experts. Offers significant coverage of the language, providing a large volume of translated words of excellent quality.
Spanish sentences retrieved from the corpus are ideal for NLP model training, presenting approximately 20 million words. The sentences provide a great coverage of Spanish-speaking countries and are accordingly tagged to a particular country or dialect.
This Spanish language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for building linguistically aware AI systems and language technologies.
Curated word-level audio data for the Spanish language, which covers all varieties of world Spanish, providing rich dialectal diversity in the Spanish language.
This language data contains a carefully curated and comprehensive list of 450,000 Spanish words.
Use Cases:
We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD).
If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Oxford.Languages@oup.com to start the conversation.
Pricing:
Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.
Contact our team or email us at Oxford.Languages@oup.com to explore pricing options and discover how our language data can support your goals.
More than 250 million tweets in Spanish from 331 Spanish-speaking cities in Latin America, Spain and the United States were compiled from Twitter. In this data set, a column is provided with the 5000 most frequent words and one with their corresponding frequencies (the number of times the word was produced in that city) for each of the 331 cities. The reported data correspond to the years 2009 to 2016.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This US Spanish Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.
Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.
The dataset features 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.
This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.
Such domain-rich variety ensures model generalization across common real estate support conversations.
All recordings are accompanied by precise, manually verified transcriptions in JSON format.
These transcriptions streamline ASR and NLP development for Spanish real estate voice applications.
Detailed metadata accompanies each participant and conversation:
This enables smart filtering, dialect-focused model training, and structured dataset exploration.
This dataset is ideal for voice AI and NLP systems built for the real estate sector:
Cite as
Guerrero-Contreras, G., Balderas-Díaz, S., Serrano-Fernández, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.
General Description
This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.
Data Collection Method
Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.
Dataset Content
ID: A unique identifier for each tweet.
text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.
polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).
favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.
retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.
user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.
user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.
user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.
user_followers_count: The current number of followers the account has. It is a non-negative integer.
user_friends_count: The number of users that the account is following. It is a non-negative integer.
user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.
user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.
user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.
user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.
Potential Use Cases
This dataset is aimed at academic researchers and practitioners with interests in:
Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.
Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.
Exploring correlations between user engagement metrics and sentiment in discussions about AI.
Data Format and File Type
The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.
License
The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General Description
This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.
Data Collection Method
Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.
Dataset Content
ID: A unique identifier for each tweet.
text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.
polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).
favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.
retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.
user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.
user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.
user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.
user_followers_count: The current number of followers the account has. It is a non-negative integer.
user_friends_count: The number of users that the account is following. It is a non-negative integer.
user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.
user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.
user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.
user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.
Cite as
Guerrero-Contreras, G., Balderas-Díaz, S., Serrano-Fernández, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.
Potential Use Cases
This dataset is aimed at academic researchers and practitioners with interests in:
Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.
Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.
Exploring correlations between user engagement metrics and sentiment in discussions about AI.
Data Format and File Type
The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.
License
The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Colombian Spanish Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
The dataset contains 30 hours of dual-channel call center recordings between native Colombian Spanish speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Rich metadata is available for each participant and conversation:
This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The dataset comprises over 10,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
The conversations in this dataset capture the diverse language styles and expressions prevalent in Spanish Healthcare interactions. This diversity ensures the dataset accurately represents the language used by Spanish speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Spanish Healthcare interactions.
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
Each of these conversations contains various aspects of conversation flow like:
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 39 cities in the Morris County, NJ by Hispanic American Indian and Alaska Native (AIAN) population, as estimated by the United States Census Bureau. It also highlights population changes in each cities over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Non-Hispanic population of Durham County by race. It includes the distribution of the Non-Hispanic population of Durham County across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of Durham County across relevant racial categories.
Key observations
Of the Non-Hispanic population in Durham County, the largest racial group is White alone with a population of 138,134 (49.49% of the total Non-Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Durham County Population by Race & Ethnicity. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 5 cities in the Coffee County, AL by Hispanic Some Other Race (SOR) population, as estimated by the United States Census Bureau. It also highlights population changes in each cities over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 17 cities in the Story County, IA by Hispanic White population, as estimated by the United States Census Bureau. It also highlights population changes in each cities over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 278 cities in the Vermont by Hispanic Some Other Race (SOR) population, as estimated by the United States Census Bureau. It also highlights population changes in each cities over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Spanish Braille Alphabet Dataset
This dataset will includes over 10,000 images of the Spanish Braille alphabet, featuring lowercase and uppercase letters, numbers, and punctuation marks. While the images may not be of the highest quality, they are valuable for basic Braille recognition tasks and can serve as a foundation for models designed for accessibility, text-to-Braille conversion, and educational purposes. This dataset is an essential resource for projects focused on improving accessibility for the visually impaired in Spanish-speaking regions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 75 cities in the Berks County, PA by Hispanic White population, as estimated by the United States Census Bureau. It also highlights population changes in each cities over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 7 cities in the Roger Mills County, OK by Hispanic Some Other Race (SOR) population, as estimated by the United States Census Bureau. It also highlights population changes in each cities over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 17 cities in the Story County, IA by Hispanic Some Other Race (SOR) population, as estimated by the United States Census Bureau. It also highlights population changes in each cities over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 7 cities in the Sumter County, AL by Hispanic Asian population, as estimated by the United States Census Bureau. It also highlights population changes in each cities over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Mexican Spanish Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Spanish -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native Mexican Spanish speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 16 cities in the Cleveland County, NC by Hispanic American Indian and Alaska Native (AIAN) population, as estimated by the United States Census Bureau. It also highlights population changes in each cities over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Our Spanish language datasets are carefully compiled and annotated by language and linguistic experts; you can find them available for licensing:
Key Features (approximate numbers):
Our Spanish monolingual reliably offers clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Spanish language.
The bilingual data provides translations in both directions, from English to Spanish and from Spanish to English. It is annually reviewed and updated by our in-house team of language experts. Offers significant coverage of the language, providing a large volume of translated words of excellent quality.
Spanish sentences retrieved from the corpus are ideal for NLP model training, presenting approximately 20 million words. The sentences provide a great coverage of Spanish-speaking countries and are accordingly tagged to a particular country or dialect.
This Spanish language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for building linguistically aware AI systems and language technologies.
Curated word-level audio data for the Spanish language, which covers all varieties of world Spanish, providing rich dialectal diversity in the Spanish language.
This language data contains a carefully curated and comprehensive list of 450,000 Spanish words.
Use Cases:
We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD).
If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Oxford.Languages@oup.com to start the conversation.
Pricing:
Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.
Contact our team or email us at Oxford.Languages@oup.com to explore pricing options and discover how our language data can support your goals.