5 datasets found

F
US Spanish General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). US Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-us-spanish
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US Spanish communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic US accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native US Spanish speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of USA to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Spanish speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for US Spanish.

•
Voice Assistants: Build smart assistants capable of understanding natural US conversations.

<span
N
London, KY Non-Hispanic Population Breakdown By Race Dataset: Non-Hispanic...
neilsberg.com
csv, json
Updated Feb 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). London, KY Non-Hispanic Population Breakdown By Race Dataset: Non-Hispanic Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/london-ky-population-by-race/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
London, Kentucky, London, KY
Variables measured
Non-Hispanic Asian Population, Non-Hispanic Black Population, Non-Hispanic White Population, Non-Hispanic Some other race Population, Non-Hispanic Two or more races Population, Non-Hispanic American Indian and Alaska Native Population, Non-Hispanic Native Hawaiian and Other Pacific Islander Population, Non-Hispanic Asian Population as Percent of Total Non-Hispanic Population, Non-Hispanic Black Population as Percent of Total Non-Hispanic Population, Non-Hispanic White Population as Percent of Total Non-Hispanic Population, and 4 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) Non-Hispanic population and (b) population as a percentage of the total Non-Hispanic population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and are part of Non-Hispanic classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Non-Hispanic population of London by race. It includes the distribution of the Non-Hispanic population of London across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of London across relevant racial categories.

Key observations

Of the Non-Hispanic population in London, the largest racial group is White alone with a population of 7,120 (95.80% of the total Non-Hispanic population).

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race: This column displays the racial categories (for Non-Hispanic) for the London

Population: The population of the racial category (for Non-Hispanic) in the London is shown in this column.

% of Total Population: This column displays the percentage distribution of each race as a proportion of London total Non-Hispanic population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for London Population by Race & Ethnicity. You can refer the same here
F
American English General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple English speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for US English.

•
Voice Assistants: Build smart assistants capable of understanding natural American conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:
A
Climate Ready Boston Social Vulnerability
data.boston.gov
Updated Sep 21, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boston Maps (2017). Climate Ready Boston Social Vulnerability [Dataset]. https://data.boston.gov/dataset/climate-ready-boston-social-vulnerability
Explore at:
geojson, csv, kml, zip, html, arcgis geoservices rest apiAvailable download formats
Dataset updated
Sep 21, 2017
Dataset provided by
BostonMaps
Authors
Boston Maps
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Area covered
Boston
Description
Social vulnerability is defined as the disproportionate susceptibility of some social groups to the impacts of hazards, including death, injury, loss, or disruption of livelihood. In this dataset from Climate Ready Boston, groups identified as being more vulnerable are older adults, children, people of color, people with limited English proficiency, people with low or no incomes, people with disabilities, and people with medical illnesses.

Source:

The analysis and definitions used in Climate Ready Boston (2016) are based on "A framework to understand the relationship between social factors that reduce resilience in cities: Application to the City of Boston." Published 2015 in the International Journal of Disaster Risk Reduction by Atyia Martin, Northeastern University.

Population Definitions:

Older Adults:
Older adults (those over age 65) have physical vulnerabilities in a climate event; they suffer from higher rates of medical illness than the rest of the population and can have some functional limitations in an evacuation scenario, as well as when preparing for and recovering from a disaster. Furthermore, older adults are physically more vulnerable to the impacts of extreme heat. Beyond the physical risk, older adults are more likely to be socially isolated. Without an appropriate support network, an initially small risk could be exacerbated if an older adult is not able to get help.
Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for population over 65 years of age.
Attribute label: OlderAdult

Children:
Families with children require additional resources in a climate event. When school is cancelled, parents need alternative childcare options, which can mean missing work. Children are especially vulnerable to extreme heat and stress following a natural disaster.
Data source: 2010 American Community Survey 5-year Estimates (ACS) data by census tract for population under 5 years of age.
Attribute label: TotChild

People of Color:
People of color make up a majority (53 percent) of Boston’s population. People of color are more likely to fall into multiple vulnerable groups as
well. People of color statistically have lower levels of income and higher levels of poverty than the population at large. People of color, many of whom also have limited English proficiency, may not have ready access in their primary language to information about the dangers of extreme heat or about cooling center resources. This risk to extreme heat can be compounded by the fact that people of color often live in more densely populated urban areas that are at higher risk for heat exposure due to the urban heat island effect.
Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract: Black, Native American, Asian, Island, Other, Multi, Non-white Hispanics.
Attribute label: POC2

Limited English Proficiency:
Without adequate English skills, residents can miss crucial information on how to prepare
for hazards. Cultural practices for information sharing, for example, may focus on word-of-mouth communication. In a flood event, residents can also face challenges communicating with emergency response personnel. If residents are more socially
isolated, they may be less likely to hear about upcoming events. Finally, immigrants, especially ones who are undocumented, may be reluctant to use government services out of fear of deportation or general distrust of the government or emergency personnel.
Data Source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract, defined as speaks English only or speaks English “very well”.
Attribute label: LEP

Low to no Income:
A lack of financial resources impacts a household’s ability to prepare for a disaster event and to support friends and neighborhoods. For example, residents without televisions, computers, or data-driven mobile phones may face challenges getting news about hazards or recovery resources. Renters may have trouble finding and paying deposits for replacement housing if their residence is impacted by flooding. Homeowners may be less able to afford insurance that will cover flood damage. Having low or no income can create difficulty evacuating in a disaster event because of a higher reliance on public transportation. If unable to evacuate, residents may be more at risk without supplies to stay in their homes for an extended period of time. Low- and no-income residents can also be more vulnerable to hot weather if running air conditioning or fans puts utility costs out of reach.
Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for low-to- no income populations. The data represents a calculated field that combines people who were 100% below the poverty level and those who were 100–149% of the poverty level.
Attribute label: Low_to_No

People with Disabilities:
People with disabilities are among the most vulnerable in an emergency; they sustain disproportionate rates of illness, injury, and death in disaster events.46 People with disabilities can find it difficult to adequately prepare for a disaster event, including moving to a safer place. They are more likely to be left behind or abandoned during evacuations. Rescue and relief resources—like emergency transportation or shelters, for example— may not be universally accessible. Research has revealed a historic pattern of discrimination against people with disabilities in times of resource scarcity, like after a major storm and flood.
Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for total civilian non-institutionalized population, including: hearing difficulty, vision difficulty, cognitive difficulty, ambulatory difficulty, self-care difficulty, and independent living difficulty.
Attribute label: TotDis

Medical Illness:
Symptoms of existing medical illnesses are often exacerbated by hot temperatures. For example, heat can trigger asthma attacks or increase already high blood pressure due to the stress of high temperatures put on the body. Climate events can interrupt access to normal sources of healthcare and even life-sustaining medication. Special planning is required for people experiencing medical illness. For example, people dependent on dialysis will have different evacuation and care needs than other Boston residents in a climate event.
Data source: Medical illness is a proxy measure which is based on EASI data accessed through Simply Map. Health data at the local level in Massachusetts is not available beyond zip codes. EASI modeled the health statistics for the U.S. population based upon age, sex, and race probabilities using U.S. Census Bureau data. The probabilities are modeled against the census and current year and five year forecasts. Medical illness is the sum of asthma in children, asthma in adults, heart disease, emphysema, bronchitis, cancer, diabetes, kidney disease, and liver disease. A limitation is that these numbers may be over-counted as the result of people potentially having more than one medical illness. Therefore, the analysis may have greater numbers of people with medical illness within census tracts than actually present. Overall, the analysis was based on the relationship between social factors.
Attribute label: MedIllnes

Other attribute definitions:
GEOID10: Geographic identifier: State Code (25), Country Code (025), 2010 Census Tract
AREA_SQFT: Tract area (in square feet)
AREA_ACRES: Tract area (in acres)
POP100_RE: Tract population count
HU100_RE: Tract housing unit count
Name: Boston Neighborhood
d
Enslaved People in the African American National Biography, 1508-1865
search.dataone.org
dataverse.harvard.edu
Updated Nov 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niven, Steven J. (2023). Enslaved People in the African American National Biography, 1508-1865 [Dataset]. http://doi.org/10.7910/DVN/FIEYGJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/FIEYGJ
Dataset updated
Nov 19, 2023
Dataset provided by
Harvard Dataverse
Authors
Niven, Steven J.
Time period covered
Jan 1, 1508 - Jan 1, 1865
Description
The "Enslaved People in the African American National Biography, 1508-1865" dataset builds on the complete print and online collection of the African American National Biography (AANB), edited by Henry Louis Gates, Jr. and Evelyn Brooks Higginbotham. The full collection contains over 6,000 biographical entries of named historical individuals, including 1,304 for subjects born before 1865 and the abolition of slavery in the United States. In making a subset of biographical entries from the multivolume work, the goal was to extract life details from those biographies into an easy-to-view database form that details whether a subject was enslaved for some or all of their lives and to provide the main biographical details of each subject for contextual analysis and comparison. 52 fields covering location data; gender; names, alternate names and suffixes; dates and places of birth and death; and up to 8 occupations were included. We also added 13 unique fields that provide biographical details on each subject: Free born in North America; Free before 13th Amendment; Ever Enslaved; How was freedom attained; Other/uncertain status; African born; Parent information; Runaways and rebels; Education/literacy; Religion; Slave narrative or memoir author; Notes; and Images.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

FutureBee AI (2022). US Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-us-spanish

US Spanish General Conversation Speech Dataset for ASR

US Spanish General Conversation Speech Corpus

Explore at:

wavAvailable download formats

Dataset updated

Aug 1, 2022

Dataset provided by

FutureBeeAI

Authors

FutureBee AI

License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Area covered

United States

Dataset funded by

FutureBeeAI

Description

Introduction

Welcome to the US Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US Spanish communication.

Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic US accents and dialects.

Speech Data

The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

•Participant Diversity:

•

Speakers: 60 verified native US Spanish speakers from FutureBeeAI’s contributor community.

•

Regions: Representing various provinces of USA to ensure dialectal diversity and demographic balance.

•

Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:

•

Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•

Duration: Each conversation ranges from 15 to 60 minutes.

•

Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•

Environment: Quiet, echo-free settings with no background noise.

Topic Diversity

The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

•Sample Topics Include:

•Family & Relationships

•Food & Recipes

•Education & Career

•Healthcare Discussions

•Social Issues

•Technology & Gadgets

•Travel & Local Culture

•Shopping & Marketplace Experiences, and many more.

Transcription

Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

•Transcription Highlights:

•Speaker-segmented dialogues

•Time-coded utterances

•Non-speech elements (pauses, laughter, etc.)

•High transcription accuracy, achieved through double QA pass, average WER < 5%

These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

Metadata

The dataset comes with granular metadata for both speakers and recordings:

•

Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•

Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

Usage and Applications

This dataset is a versatile resource for multiple Spanish speech and language AI applications:

•

ASR Development: Train accurate speech-to-text systems for US Spanish.

•

Voice Assistants: Build smart assistants capable of understanding natural US conversations.

<span

Clear search

Close search

Google apps

Main menu

US Spanish General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

London, KY Non-Hispanic Population Breakdown By Race Dataset: Non-Hispanic...

About this dataset

Content

Inspiration

Recommended for further research

American English General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Climate Ready Boston Social Vulnerability

Enslaved People in the African American National Biography, 1508-1865

US Spanish General Conversation Speech Dataset for ASRSee More Versions

US Spanish General Conversation Speech Corpus

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

US Spanish General Conversation Speech Dataset for ASR