29 datasets found

The Great American Coffee Taste Test Dataset
kaggle.com
Updated May 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umer Haddii (2024). The Great American Coffee Taste Test Dataset [Dataset]. https://www.kaggle.com/datasets/umerhaddii/the-great-american-coffee-taste-test-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2024
Dataset provided by
Kaggle
Authors
Umer Haddii
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

World champion barista James Hoffmann and Cometeer partnered to conduct a first-of-its-kind coffee taste test. Cometeer shipped 5000 coffee kits across America. Kits contained four different coffees - pre-extracted and flash frozen. Tasters melted and diluted the coffee capsules for a largely identical tasting experience. Tasting and ratings were conducted blind [1]. After survey responses were collected (provided data), some attributes of the coffee were revealed.

In October 2023, World champion barista James Hoffmann and coffee company Cometeer held the "Great American Coffee Taste Test" on YouTube, during which viewers were asked to fill out a survey about 4 coffees they ordered from Cometeer for the tasting. Data blogger Robert McKeon Aloe analyzed the data the following month.

Content

Geography: US

Time-period: 2023

Unit of Analysis: The Great American Coffee Taste Test

Variables

submission_id = Submission ID

age = What is your age?

cups = How many cups of coffee do you typically drink per day?

where_drink = Where do you typically drink coffee?

brew = How do you brew coffee at home?

brew_other = How else do you brew coffee at home?

purchase = On the go, where do you typically purchase coffee?

purchase_other = Where else do you purchase coffee?

favorite = What is your favorite coffee drink?

favorite_specify = Please specify what your favorite coffee drink is

additions = Do you usually add anything to your coffee?

additions_other = What else do you add to your coffee?

dairy = What kind of dairy do you add?

sweetener = What kind of sugar or sweetener do you add?

style = Before today's tasting, which of the following best described what kind of coffee you like?
-**strength** = How strong do you like your coffee?

roast_level = What roast level of coffee do you prefer?

caffeine = How much caffeine do you like in your coffee?

expertise = Lastly, how would you rate your own coffee expertise?

coffee_a_bitterness = Coffee A - Bitterness

coffee_a_acidity = Coffee A - Acidity

coffee_a_personal_preference = Coffee A - Personal Preference

coffee_a_notes = Coffee A - Notes

coffee_b_bitterness = Coffee B - Bitterness

coffee_b_acidity = Coffee B - Acidity

coffee_b_personal_preference = Coffee B - Personal Preference

coffee_b_notes = Coffee B - Notes

coffee_c_bitterness = Coffee C - Bitterness

coffee_c_acidity = Coffee C - Acidity

coffee_c_personal_preference = Coffee C - Personal Preference

coffee_c_notes = Coffee C - Notes

coffee_d_bitterness = Coffee D - Bitterness

coffee_d_acidity = Coffee D - Acidity

coffee_d_personal_preference = Coffee D - Personal Preference

coffee_d_notes = Coffee D - Notes

prefer_abc = Between Coffee A, Coffee B, and Coffee C which did you prefer?

prefer_ad = Between Coffee A and Coffee D, which did you prefer?

prefer_overall = Lastly, what was your favorite overall coffee?

wfh = Do you work from home or in person?

total_spend = In total, how much money do you typically spend on coffee in a month?

why_drink = Why do you drink coffee?

why_drink_other = Other reason for drinking coffee

taste = Do you like the taste of coffee?

know_source = Do you know where your coffee comes from?

most_paid = What is the most you've ever paid for a cup of coffee?

most_willing = What is the most you'd ever be willing to pay for a cup of coffee?

value_cafe = Do you feel like you’re getting good value for your money when you buy coffee at a cafe?

spent_equipment = Approximately how much have you spent on coffee equipment in the past 5 years?

value_equipment = Do you feel like you’re getting good value for your money when you buy coffee at a cafe?

gender = Gender

gender_specify = Gender (please specify)

education_level = Education Level

ethnicity_race = Ethnicity/Race

ethnicity_race_specify = Ethnicity/Race (please specify)

employment_status = Employment Status

number_children = Number of Children

political_affiliation = Political Affiliation

Acknowledgement

Datasource: The data is collected thorugh a survey called The Great American Coffee Taste Test held by James Haffmann

Inspiration: [Great American Coffee...
T
United States Consumer Spending
tradingeconomics.com
tr.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2024). United States Consumer Spending [Dataset]. https://tradingeconomics.com/united-states/consumer-spending
Explore at:
xml, json, excel, csvAvailable download formats
Dataset updated
Mar 7, 2024
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 31, 1947 - Jun 30, 2025
Area covered
United States
Description
Consumer Spending in the United States increased to 16445.70 USD Billion in the second quarter of 2025 from 16345.80 USD Billion in the first quarter of 2025. This dataset provides the latest reported value for - United States Consumer Spending - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
d
Replication Data and Code for: Why ‘Buy American’ is a bad idea but...
dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Larch, Mario; Lechthaler, Wolfgang (2023). Replication Data and Code for: Why ‘Buy American’ is a bad idea but politicians still like it [Dataset]. http://doi.org/10.5683/SP3/JKVBKG
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/JKVBKG
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Larch, Mario; Lechthaler, Wolfgang
Description
The data and programs replicate tables and figures from "Why ‘Buy American’ is a bad idea but politicians still like it", by Larch and Lechthaler. Please see the ReadMe file for additional details.
H
Consumer Expenditure Survey (CE)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Consumer Expenditure Survey (CE) [Dataset]. http://doi.org/10.7910/DVN/UTNJAH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/UTNJAH
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the consumer expenditure survey (ce) with r the consumer expenditure survey (ce) is the primo data source to understand how americans spend money. participating households keep a running diary about every little purchase over the year. those diaries are then summed up into precise expenditure categories. how else are you gonna know that the average american household spent $34 (±2) on bacon, $826 (±17) on cellular phones, and $13 (±2) on digital e-readers in 2011? an integral component of the market basket calculation in the consumer price index, this survey recently became available as public-use microdata and they're slowly releasing historical files back to 1996. hooray! for a t aste of what's possible with ce data, look at the quick tables listed on their main page - these tables contain approximately a bazillion different expenditure categories broken down by demographic groups. guess what? i just learned that americans living in households with $5,000 to $9,999 of annual income spent an average of $283 (±90) on pets, toys, hobbies, and playground equipment (pdf page 3). you can often get close to your statistic of interest from these web tables. but say you wanted to look at domestic pet expenditure among only households with children between 12 and 17 years old. another one of the thirteen web tables - the consumer unit composition table - shows a few different breakouts of households with kids, but none matching that exact population of interest. the bureau of labor statistics (bls) (the survey's designers) and the census bureau (the survey's administrators) have provided plenty of the major statistics and breakouts for you, but they're not psychic. if you want to comb through this data for specific expenditure categories broken out by a you-defined segment of the united states' population, then let a little r into your life. fun starts now. fair warning: only analyze t he consumer expenditure survey if you are nerd to the core. the microdata ship with two different survey types (interview and diary), each containing five or six quarterly table formats that need to be stacked, merged, and manipulated prior to a methodologically-correct analysis. the scripts in this repository contain examples to prepare 'em all, just be advised that magnificent data like this will never be no-assembly-required. the folks at bls have posted an excellent summary of what's av ailable - read it before anything else. after that, read the getting started guide. don't skim. a few of the descriptions below refer to sas programs provided by the bureau of labor statistics. you'll find these in the C:\My Directory\CES\2011\docs directory after you run the download program. this new github repository contains three scripts: 2010-2011 - download all microdata.R lo op through every year and download every file hosted on the bls's ce ftp site import each of the comma-separated value files into r with read.csv depending on user-settings, save each table as an r data file (.rda) or stat a-readable file (.dta) 2011 fmly intrvw - analysis examples.R load the r data files (.rda) necessary to create the 'fmly' table shown in the ce macros program documentation.doc file construct that 'fmly' table, using five quarters of interviews (q1 2011 thru q1 2012) initiate a replicate-weighted survey design object perform some lovely li'l analysis examples replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using unimputed variables replicate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t -tests using unimputed variables create an rsqlite database (to minimize ram usage) containing the five imputed variable files, after identifying which variables were imputed based on pdf page 3 of the user's guide to income imputation initiate a replicate-weighted, database-backed, multiply-imputed survey design object perform a few additional analyses that highlight the modified syntax required for multiply-imputed survey designs replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using imputed variables repl icate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t-tests using imputed variables replicate the %proc_reg() and %proc_logistic() macros found in "ce macros.sas" and provide some examples of regressions and logistic regressions using both unimputed and imputed variables replicate integrated mean and se.R match each step in the bls-provided sas program "integr ated mean and se.sas" but with r instead of sas create an rsqlite database when the expenditure table gets too large for older computers to handle in ram export a table "2011 integrated mean and se.csv" that exactly matches the contents of the sas-produced "2011 integrated mean and se.lst" text file click here to view these three scripts for...
p
Gambling Data America
listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Gambling Data America [Dataset]. https://listtodata.com/gambling-data-america
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Dataset authored and provided by
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
United States
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Gambling Data America can be vital for you to make connections with the gambler. In fact, this database includes information about who gambles. Again, the types of gambling they participate in, and the money affected. Furthermore, everyone can use our valid contacts to expand their betting business in targeted locations. Also, Gambling Data America provides valuable insights. That can support you optimize your betting activities more. As well as it improves profitability. Hence, that helps to stay competitive in the active gambling market. Moreover, Gambling Data America can greatly benefit your betting activities in various ways. This dataset gives you knowledge of what American gamblers like. As a result, that allows you to decide where and how to bet wisely. In other words, this is a very important thigh for targeted marketing. People can use player details and preferences to customize your marketing. Likewise, this helps you reach the right people with your promotions. Indeed, to make it more effective. America Gambling Data is a huge collection of American people. Furthermore, anyone can take advantage of this phone number data by buying this contact list. Similarly, the data we are providing is real and almost 95% latest and active. This website has an expert team that is working regularly to put all the information together. Also, with their effort and our experience, we are creating the best gambling database for our clients. In addition, America Gambling Data has all kinds of information that you need for your business and this will surely grow your business. However, with this library, you can create your own contact directory. Thus, people can build different professions’ contact lists also different states’ lists. So this is a complete package that can be a great asset for any person trying to do marketing.

Coffee Taste Test

kaggle.com

Updated Jun 12, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Joakim Arvidsson (2024). Coffee Taste Test [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/coffee-taste-test

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 12, 2024

Dataset provided by

Kaggle

Authors

Joakim Arvidsson

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Great American Coffee Taste Test

In October 2023, "world champion barista" James Hoffmann and coffee company Cometeer held the "Great American Coffee Taste Test" on YouTube, during which viewers were asked to fill out a survey about 4 coffees they ordered from Cometeer for the tasting. Data blogger Robert McKeon Aloe analyzed the data the following month.

Do you think participants in this survey are representative of Americans in general?

Data Dictionary

`coffee_survey.csv`

variable	class	description
submission_id	character	Submission ID
age	character	What is your age?
cups	character	How many cups of coffee do you typically drink per day?
where_drink	character	Where do you typically drink coffee?
brew	character	How do you brew coffee at home?
brew_other	character	How else do you brew coffee at home?
purchase	character	On the go, where do you typically purchase coffee?
purchase_other	character	Where else do you purchase coffee?
favorite	character	What is your favorite coffee drink?
favorite_specify	character	Please specify what your favorite coffee drink is
additions	character	Do you usually add anything to your coffee?
additions_other	character	What else do you add to your coffee?
dairy	character	What kind of dairy do you add?
sweetener	character	What kind of sugar or sweetener do you add?
style	character	Before today's tasting, which of the following best described what kind of coffee you like?
strength	character	How strong do you like your coffee?
roast_level	character	What roast level of coffee do you prefer?
caffeine	character	How much caffeine do you like in your coffee?
expertise	numeric	Lastly, how would you rate your own coffee expertise?
coffee_a_bitterness	numeric	Coffee A - Bitterness
coffee_a_acidity	numeric	Coffee A - Acidity
coffee_a_personal_preference	numeric	Coffee A - Personal Preference
coffee_a_notes	character	Coffee A - Notes
coffee_b_bitterness	numeric	Coffee B - Bitterness
coffee_b_acidity	numeric	Coffee B - Acidity
coffee_b_personal_preference	numeric	Coffee B - Personal Preference
coffee_b_notes	character	Coffee B - Notes
coffee_c_bitterness	numeric	Coffee C - Bitterness
coffee_c_acidity	numeric	Coffee C - Acidity
coffee_c_personal_preference	numeric	Coffee C - Personal Preference
coffee_c_notes	character	Coffee C - Notes
coffee_d_bitterness	numeric	Coffee D - Bitterness
coffee_d_acidity	numeric	Coffee D - Acidity
coffee_d_personal_preference	numeric	Coffee D - Personal Preference
coffee_d_notes	character	Coffee D - Notes
prefer_abc	character	Between Coffee A, Coffee B, and Coffee C which did you prefer?
prefer_ad	character	Between Coffee A and Coffee D, which did you prefer?
prefer_overall	character	Lastly, what was your favorite overall coffee?
wfh	character	Do you work from home or in person?
total_spend	character	In total, much money do you typically spend on coffee in a month?
why_drink	character	Why do you drink coffee?
why_drink_other	character	Other reason for drinking coffee
taste	character	Do you like the taste of coffee?
know_source	character	Do you know where your coffee comes from?
most_paid	character	What is the most you've ever paid for a cup of coffee?
most_willing	character	What is the most you'd ever be willing to pay for a cup of coffee?
value_cafe	character	Do you feel like you’re getting good value for your money when you buy coffee at a cafe?
spent_equipment	character	Approximately how much have you spent on coffee equipment in the past 5 years?
value_equipment	character	Do you feel like you’re getting good value for your mo...

F
American English Call Center Data for BFSI AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English Call Center Data for BFSI AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/bfsi-call-center-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US English Call Center Speech Dataset for the BFSI (Banking, Financial Services, and Insurance) sector is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking customers. Featuring over 30 hours of real-world, unscripted audio, it offers authentic customer-agent interactions across a range of BFSI services to train robust and domain-aware ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, financial technology teams, and NLP researchers to build high-accuracy, production-ready models across BFSI customer service scenarios.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic financial support settings, these conversations span diverse BFSI topics from loan enquiries and card disputes to insurance claims and investment options, providing deep contextual coverage for model training and evaluation.
•Participant Diversity:
•
Speakers: 60 native US English speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across United States of America to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world BFSI voice coverage.
•Inbound Calls:
•Debit Card Block Request
•Transaction Disputes
•Loan Enquiries
•Credit Card Billing Issues
•Account Closure & Claims
•Policy Renewals & Cancellations
•Retirement & Tax Planning
•Investment Risk Queries, and more
•Outbound Calls:
•Loan & Credit Card Offers
•Customer Surveys
•EMI Reminders
•Policy Upgrades
•Insurance Follow-ups
•Investment Opportunity Calls
•Retirement Planning Reviews, and more
This variety ensures models trained on the dataset are equipped to handle complex financial dialogues with contextual accuracy.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•30 hours-coded Segments
•Non-speech Tags (e.g., pauses, background noise)
•High transcription accuracy with word error rate < 5% due to double-layered quality checks.
These transcriptions are production-ready, making financial domain model training faster and more accurate.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender,
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo (2024). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [Dataset]. http://doi.org/10.5281/zenodo.1188976
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1188976
Dataset updated
Oct 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.

Citing the RAVDESS

The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.

Academic paper citation

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

Personal use citation

Include a link to this Zenodo page - https://zenodo.org/record/1188976

Commercial Licenses

Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.

Contact Information

If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

Example Videos

Watch a sample of the RAVDESS speech and song videos.

Emotion Classification Users

If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].

Construction and Validation

Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.

The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.

Contents

Audio-only files

Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):

Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.

Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.

Audio-Visual and Video-only files

Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:

Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x 24 actors = 2880.

Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x 23 actors = 2024.

File Summary

In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).

File naming convention

Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:

Filename identifiers

Modality (01 = full-AV, 02 = video-only, 03 = audio-only).

Vocal channel (01 = speech, 02 = song).

Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).

Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.

Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").

Repetition (01 = 1st repetition, 02 = 2nd repetition).

Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

Filename example: 02-01-06-01-02-01-12.mp4

Video-only (02)

Speech (01)

Fearful (06)

Normal intensity (01)

Statement "dogs" (02)

1st Repetition (01)

12th Actor (12)

Female, as the actor ID number is even.

License information

The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0

Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.

Related Data sets

RAVDESS Facial Landmark Tracking data set [Zenodo project page].
T
US Retail Sales
tradingeconomics.com
zh.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Aug 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). US Retail Sales [Dataset]. https://tradingeconomics.com/united-states/retail-sales
Explore at:
csv, xml, excel, jsonAvailable download formats
Dataset updated
Aug 15, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 29, 1992 - Aug 31, 2025
Area covered
United States
Description
Retail Sales in the United States increased 0.60 percent in August of 2025 over the previous month. This dataset provides - U.S. December Retail Sales Increased More Than Forecast - actual values, historical data, forecast, chart, statistics, economic calendar and news.
D
Data Broker Service Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Data Broker Service Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-broker-service-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Oct 16, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Broker Service Market Outlook

The global data broker service market size is projected to grow from USD 250 billion in 2023 to an estimated USD 450 billion by 2032, reflecting a compound annual growth rate (CAGR) of 6.7%. This substantial growth can be attributed to increasing digitalization, the exponential rise of data-driven decision-making across industries, and the growing realization of the value derived from data analytics. As businesses continue to recognize the potential of leveraging consumer, business, financial, and health data, the demand for data brokerage services is poised to expand significantly.

One of the primary growth factors for the data broker service market is the increasing importance of data in driving business strategies and operations. Companies are increasingly relying on consumer and market data to gain insights into market trends, consumer behavior, and competitive landscapes. This surge in data utilization across sectors such as retail, healthcare, and finance is propelling the demand for data brokerage services that can provide accurate and comprehensive data sets. The proliferation of digital platforms and the Internet of Things (IoT) has further amplified the volume of data generated, thus boosting the need for efficient data brokerage services.

Moreover, advancements in artificial intelligence (AI) and machine learning (ML) technologies are significantly contributing to the market's growth. These technologies enable enhanced data analysis, predictive analytics, and real-time decision-making, making data brokerage services more valuable. Businesses are increasingly investing in AI and ML to analyze large datasets more efficiently and extract actionable insights. Data brokers, in turn, are leveraging these technologies to offer more sophisticated and tailored data solutions, thus attracting a broader customer base.

Privacy regulations and data protection laws are also playing a crucial role in shaping the data broker service market. While these regulations pose challenges, they also create opportunities for compliant data brokers to differentiate themselves in the market. Companies are more inclined to partner with data brokers that demonstrate robust data governance practices and adhere to regulatory requirements. This trend is driving the market towards more ethical and transparent data brokerage practices, increasing the trust and credibility of data brokers among businesses and consumers alike.

The regional outlook for the data broker service market highlights North America as a dominant player, primarily due to the high adoption of data-driven strategies among businesses and the presence of major data brokerage firms. Europe follows closely, driven by stringent data protection regulations like GDPR, which necessitate secure and compliant data handling. The Asia Pacific region is expected to witness the fastest growth, fueled by the rapid digital transformation in countries like China and India and the increasing use of data analytics in various industries. Latin America and the Middle East & Africa regions are also showing promising growth, supported by the rising awareness of data's strategic value and increasing investments in data analytics infrastructure.

Data Type Analysis

The data broker service market by data type comprises consumer data, business data, financial data, health data, and other categories. Consumer data is one of the most significant segments within this market. This type of data includes information on consumer behavior, preferences, purchasing patterns, and demographics. Businesses leverage consumer data to tailor their marketing strategies, enhance customer experiences, and drive sales growth. The increasing use of digital platforms for shopping, social interaction, and information consumption is continually generating vast amounts of consumer data, thereby fueling the demand for consumer data brokerage services.

Business data, encompassing company profiles, industry trends, and competitive intelligence, is another vital segment. Organizations require business data to strategize market entry, expansion, and competitive positioning. Data brokers play a crucial role in aggregating and providing actionable business insights that help companies navigate complex market dynamics. The rise of global trade, the need for cross-border business intelligence, and the growing importance of data-driven decision-making in corporate strategies are driving the demand for business data brokerage services.

Financial data is crucial for sectors like banking, fina
T
United States Michigan Consumer Sentiment
tradingeconomics.com
es.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Oct 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). United States Michigan Consumer Sentiment [Dataset]. https://tradingeconomics.com/united-states/consumer-confidence
Explore at:
csv, xml, json, excelAvailable download formats
Dataset updated
Oct 10, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 30, 1952 - Oct 31, 2025
Area covered
United States
Description
Consumer Confidence in the United States decreased to 55 points in October from 55.10 points in September of 2025. This dataset provides the latest reported value for - United States Consumer Sentiment - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
g
National Research Bureau, Estimated Shopping Mall Sales By State, USA, 2005
geocommons.com
Updated May 27, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Research Bureau (2008). National Research Bureau, Estimated Shopping Mall Sales By State, USA, 2005 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
May 27, 2008
Dataset provided by
data
National Research Bureau
Description
This Dataset highlights and geocodes the estimated shopping mall sales by state. This data provided by the National Research Bureau, 2005, and was collected through http://www.statemaster.com/graph/lif_sho_mal_est_sal-lifestyle-shopping-malls-estimated-sales
d
Location Data | South America | Real-Time API Polygon-Based GPS Stream
datarade.ai
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irys, Location Data | South America | Real-Time API Polygon-Based GPS Stream [Dataset]. https://datarade.ai/data-products/irys-geospatial-data-insights-south-america-real-time-irys
Explore at:
.json, .csv, .xls, .sqlAvailable download formats
Dataset authored and provided by
Irys
Area covered
Brazil
Description
This location data product focuses on real-time GPS pings collected across South America. Using the Irys Location API, users can access polygon-based movement patterns from anonymized mobile devices across urban and rural areas.

Events include timestamps, precise geocoordinates, country codes, and device identifiers. Query the dataset using polygon filters and receive structured outputs via API or cloud endpoints. Supported formats include JSON, CSV, and Parquet.

With historical backfill and fresh updates every day, the dataset is ideal for retailers, advertisers, city planners, and researchers analyzing behavior and trends across the Americas. It supports use cases like retail site selection, mobility forecasting, and geofencing for public safety.

All data is delivered with a maximum lag of three days and complies with GDPR and CCPA regulations.
m
Factori Audience | 1.2B unique mobile users in APAC, EU, North America and...
app.mobito.io
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Factori Audience | 1.2B unique mobile users in APAC, EU, North America and MENA [Dataset]. https://app.mobito.io/data-product/audience-data
Explore at:
Dataset updated
Dec 24, 2022
Area covered
North America, SOUTH_AMERICA, OCEANIA, AFRICA, ASIA, EUROPE
Description
We collect, validate, model, and segment raw data signals from over 900+ sources globally to deliver thousands of mobile audience segments. We then combine that data with other public and private data sources to derive interests, intent, and behavioral attributes. Our proprietary algorithms then clean, enrich, unify and aggregate these data sets for use in our products. We have categorized our audience data into consumable categories such as interest, demographics, behavior, geography, etc. Audience Data Categories:Below mentioned data categories include consumer behavioral data and consumer profiles (available for the US and Australia) divided into various data categories. Brand Shoppers:Methodology: This category has been created based on the high intent of users in terms of their visits to Brand outlets in the real world. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. Place Category Visitors:Methodology: This category has been created based on the high intent of users visiting specific places of interest in the real world. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. Demographics:This category has been created based on deterministic data that we receive from apps based on the declared gender and age data. Marital Status, Education, Party affiliation, and State residency are available in the US. Geo-Behavioural:This category has been created based on the high intent of users in terms of the frequency of their visits to specific granular places of interest in the real world. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. Interests:This segment is created based on users' interest in a specific subject while browsing the internet when the visited website category is clearly focused on a specific subject such as cars, cooking, traveling, etc. We use a deterministic model to assign a proper profile and time that information is valid. The recency of data can range from 14 to 30 days, depending on the topic. Intent:Factori receives data from many partners to deliver high-quality pieces of information about users’ shopping intent. We collect data from sources connected to the eCommerce sector and we also receive data connected to online transactions from affiliate networks to deliver the most accurate segments with purchase intentions, such as laptops, mobile phones, or cars. The recency of data can range from 7 to 14 days depending on the product category. Events:This category was created based on the high interest of users in terms of content related to specific global events - sports, culture, and gaming. Among the event segments, we also distinguish categories related to the interest in certain lifestyle choices and behaviors. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. App Usage:Mobile category is a branch of the taxonomy that is dedicated only to the data that is based on mobile advertising IDs. It is based on the categorization of the mobile apps that the user has installed on the device. Auto Ownership:Consumer Profiles - Available for US and AustraliaThis audience has been created based on users declaring that they own a certain brand of automobile and other automotive attributes via a survey or registration. These audiences are currently available in the USA. Motorcycle Ownership:Consumer Profiles - Available for US and AustraliaThis audience has been created based on users declaring that they own a certain brand of motorcycle and other motorcycle-based attributes via a survey or registration. These audiences are currently available for the USA. Household:Consumer Profiles - Available for the US and AustraliaThis audience has been created based on users' declaring their marital status, parental status, and the overall number of children via a survey or registration. These audiences are currently available in the USA. Financial:Consumer Profiles - Available for the US and Australia this audience has been created based on their behavior in different financial services like property ownership, mortgage, investing behavior, and wealth and declaring their estimated net worth via a survey or registration. Purchase/ Spending Behavior:Consumer Profiles - Available for the US and AustraliaThis audience has been created based on their behavior in different spending behaviors in different business verticals available in the USA. Clusters:Consumer Profiles - Available for the US and AustraliaClusters are groups of consumers who exhibit similar demographic, lifestyle, and media consumption characteristics, empowering marketers to understand the unique attributes that comprise their most profitable consumer segments. Armed with this rich data, data scientists can drive analytics and modeling to power their brand’s unique marketing initiatives. B2B Audiences;Consumer Profiles - Available for US and AustraliaThis audience has been created based on users declaring their employee credentials, designations, and companies they work in, further specifying business verticals, revenue breakdowns, and headquarters locations. Customizable Audiences Data Segment:Brands can choose the appropriate pre-made audience segments or ask our data experts about creating a custom segment that is precisely tailored to your brief in order to reach their target customers and boost the campaign's effectiveness. Location Query Granularity:Minimum area: HEX 8Maximum area: QuadKey 17/City
F
American English Call Center Data for Retail & E-Commerce AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US English Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.
•Participant Diversity:
•
Speakers: 60 native US English speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across United States of America to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.
•Inbound Calls:
•Product Inquiries
•Order Cancellations
•Refund & Exchange Requests
•Subscription Queries, and more
•Outbound Calls:
•Order Confirmations
•Upselling & Promotions
•Account Updates
•Loyalty Program Offers
•Customer Verifications, and others
Such variety enhances your model’s ability to generalize across retail-specific voice interactions.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•30 hours-coded Segments
•Non-speech Tags (e.g., pauses, cough)
•High transcription accuracy with word error rate < 5% due to double-layered quality checks.
These transcriptions are production-ready, making model training faster and more accurate.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.
Usage and Applications
This dataset is ideal for a range of voice AI and NLP applications:
•
Automatic Speech Recognition (ASR): Fine-tune English speech-to-text systems.

<span
T
United States Total Light Vehicle Sales
tradingeconomics.com
it.tradingeconomics.com
+13more
csv, excel, json, xml
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). United States Total Light Vehicle Sales [Dataset]. https://tradingeconomics.com/united-states/total-vehicle-sales
Explore at:
excel, xml, csv, jsonAvailable download formats
Dataset updated
May 2, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 1976 - Sep 30, 2025
Area covered
United States
Description
Total Vehicle Sales in the United States increased to 16.40 Million in September from 16.10 Million in August of 2025. This dataset provides the latest reported value for - United States Total Vehicle Sales - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
T
United States Imports
tradingeconomics.com
fr.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Sep 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). United States Imports [Dataset]. https://tradingeconomics.com/united-states/imports
Explore at:
json, xml, excel, csvAvailable download formats
Dataset updated
Sep 4, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 1950 - Jul 31, 2025
Area covered
United States
Description
Imports in the United States increased to 358.80 USD Billion in July from 338.80 USD Billion in June of 2025. This dataset provides the latest reported value for - United States Imports - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
T
United States Existing Home Sales
tradingeconomics.com
ru.tradingeconomics.com
+12more
csv, excel, json, xml
Updated Sep 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). United States Existing Home Sales [Dataset]. https://tradingeconomics.com/united-states/existing-home-sales
Explore at:
csv, json, xml, excelAvailable download formats
Dataset updated
Sep 25, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 1968 - Aug 31, 2025
Area covered
United States
Description
Existing Home Sales in the United States decreased to 4000 Thousand in August from 4010 Thousand in July of 2025. This dataset provides the latest reported value for - United States Existing Home Sales - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
D
Database Monitoring Tool Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Database Monitoring Tool Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/database-monitoring-tool-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Database Monitoring Tool Market Outlook

The global database monitoring tool market size was valued at approximately USD 1.5 billion in 2023 and is projected to reach around USD 3.2 billion by 2032, growing at a compound annual growth rate (CAGR) of 8.5% from 2024 to 2032. The primary growth factors driving this market include the increasing complexity of database systems, the need for enhanced database performance and security, and the rising adoption of advanced monitoring tools for efficient IT operations.

One of the key growth drivers for the database monitoring tool market is the exponential growth in data generation and database complexity. With the proliferation of IoT devices, social media, e-commerce platforms, and enterprise applications, the volume of data generated by organizations has surged dramatically. This has necessitated the need for robust database monitoring tools to ensure optimal performance, minimize downtime, and secure sensitive information. The increasing complexity of database architectures, including multi-cloud deployments and hybrid environments, further underscores the demand for sophisticated monitoring solutions.

Another significant factor contributing to market growth is the rising emphasis on database security. Cybersecurity threats have become increasingly sophisticated, with databases being prime targets for malicious attacks. Database monitoring tools play a crucial role in identifying potential vulnerabilities, detecting anomalies, and ensuring compliance with data protection regulations. As organizations strive to safeguard their data assets, the adoption of advanced monitoring solutions is expected to rise, driving market expansion over the forecast period.

Additionally, the growing adoption of cloud-based database monitoring tools is anticipated to propel market growth. Cloud computing offers several advantages, including scalability, flexibility, and cost-efficiency, which are increasingly being leveraged by enterprises to manage their databases. Cloud-based monitoring solutions provide real-time insights, automated alerts, and predictive analytics, enabling organizations to proactively address performance issues and optimize their database operations. The shift towards cloud-based deployments is likely to further augment the demand for database monitoring tools.

Regionally, North America dominates the database monitoring tool market, driven by the presence of a large number of technology-driven enterprises, high adoption of advanced IT solutions, and stringent data security regulations. Europe follows closely, with significant investments in IT infrastructure and data protection initiatives. The Asia Pacific region is expected to exhibit the highest growth rate during the forecast period, owing to the rapid digital transformation, increasing cloud adoption, and expanding IT sector in countries like China, India, and Japan. Latin America and the Middle East & Africa are also witnessing steady growth, driven by the increasing focus on data management and security.

In the retail industry, the implementation of Transaction Monitoring for Retail is becoming increasingly vital. Retailers are handling vast amounts of transaction data, from point-of-sale systems to online shopping platforms, which necessitates robust monitoring to ensure data integrity and security. Transaction monitoring helps in identifying fraudulent activities, optimizing transaction processes, and enhancing customer experience by ensuring seamless operations. As the retail sector continues to embrace digital transformation, the need for effective transaction monitoring solutions is paramount to manage the growing complexity of retail databases and to maintain competitive advantage.

Component Analysis

The market for database monitoring tools can be segmented into software and services. The software segment encompasses various monitoring solutions that provide real-time insights, analytics, and automated alerts for database performance and security. These solutions are designed to monitor database health, track performance metrics, and detect anomalies, enabling organizations to optimize their database operations. With the increasing complexity of database environments, the demand for advanced software solutions is on the rise, driving the growth of this segment.
T
United States New Home Sales
tradingeconomics.com
it.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Sep 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). United States New Home Sales [Dataset]. https://tradingeconomics.com/united-states/new-home-sales
Explore at:
csv, json, excel, xmlAvailable download formats
Dataset updated
Sep 24, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 1963 - Aug 31, 2025
Area covered
United States
Description
New Home Sales in the United States increased to 800 Thousand units in August from 664 Thousand units in July of 2025. This dataset provides the latest reported value for - United States New Home Sales - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

Facebook

Twitter

Click to copy link

Link copied

Cite

Umer Haddii (2024). The Great American Coffee Taste Test Dataset [Dataset]. https://www.kaggle.com/datasets/umerhaddii/the-great-american-coffee-taste-test-dataset

The Great American Coffee Taste Test Dataset

James Hoffmann and Cometeer survey America's coffee taste preferences

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 20, 2024

Dataset provided by

Kaggle

Authors

Umer Haddii

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

Context

World champion barista James Hoffmann and Cometeer partnered to conduct a first-of-its-kind coffee taste test. Cometeer shipped 5000 coffee kits across America. Kits contained four different coffees - pre-extracted and flash frozen. Tasters melted and diluted the coffee capsules for a largely identical tasting experience. Tasting and ratings were conducted blind [1]. After survey responses were collected (provided data), some attributes of the coffee were revealed.

In October 2023, World champion barista James Hoffmann and coffee company Cometeer held the "Great American Coffee Taste Test" on YouTube, during which viewers were asked to fill out a survey about 4 coffees they ordered from Cometeer for the tasting. Data blogger Robert McKeon Aloe analyzed the data the following month.

Content

Geography: US

Time-period: 2023

Unit of Analysis: The Great American Coffee Taste Test

Variables

submission_id = Submission ID
age = What is your age?
cups = How many cups of coffee do you typically drink per day?
where_drink = Where do you typically drink coffee?
brew = How do you brew coffee at home?
brew_other = How else do you brew coffee at home?
purchase = On the go, where do you typically purchase coffee?
purchase_other = Where else do you purchase coffee?
favorite = What is your favorite coffee drink?
favorite_specify = Please specify what your favorite coffee drink is
additions = Do you usually add anything to your coffee?
additions_other = What else do you add to your coffee?
dairy = What kind of dairy do you add?
sweetener = What kind of sugar or sweetener do you add?
style = Before today's tasting, which of the following best described what kind of coffee you like?
-**strength** = How strong do you like your coffee?
roast_level = What roast level of coffee do you prefer?
caffeine = How much caffeine do you like in your coffee?
expertise = Lastly, how would you rate your own coffee expertise?
coffee_a_bitterness = Coffee A - Bitterness
coffee_a_acidity = Coffee A - Acidity
coffee_a_personal_preference = Coffee A - Personal Preference
coffee_a_notes = Coffee A - Notes
coffee_b_bitterness = Coffee B - Bitterness
coffee_b_acidity = Coffee B - Acidity
coffee_b_personal_preference = Coffee B - Personal Preference
coffee_b_notes = Coffee B - Notes
coffee_c_bitterness = Coffee C - Bitterness
coffee_c_acidity = Coffee C - Acidity
coffee_c_personal_preference = Coffee C - Personal Preference
coffee_c_notes = Coffee C - Notes
coffee_d_bitterness = Coffee D - Bitterness
coffee_d_acidity = Coffee D - Acidity
coffee_d_personal_preference = Coffee D - Personal Preference
coffee_d_notes = Coffee D - Notes
prefer_abc = Between Coffee A, Coffee B, and Coffee C which did you prefer?
prefer_ad = Between Coffee A and Coffee D, which did you prefer?
prefer_overall = Lastly, what was your favorite overall coffee?
wfh = Do you work from home or in person?
total_spend = In total, how much money do you typically spend on coffee in a month?
why_drink = Why do you drink coffee?
why_drink_other = Other reason for drinking coffee
taste = Do you like the taste of coffee?
know_source = Do you know where your coffee comes from?
most_paid = What is the most you've ever paid for a cup of coffee?
most_willing = What is the most you'd ever be willing to pay for a cup of coffee?
value_cafe = Do you feel like you’re getting good value for your money when you buy coffee at a cafe?
spent_equipment = Approximately how much have you spent on coffee equipment in the past 5 years?
value_equipment = Do you feel like you’re getting good value for your money when you buy coffee at a cafe?
gender = Gender
gender_specify = Gender (please specify)
education_level = Education Level
ethnicity_race = Ethnicity/Race
ethnicity_race_specify = Ethnicity/Race (please specify)
employment_status = Employment Status
number_children = Number of Children
political_affiliation = Political Affiliation

Acknowledgement

Datasource: The data is collected thorugh a survey called The Great American Coffee Taste Test held by James Haffmann

Inspiration: [Great American Coffee...

Clear search

Close search

Google apps

Main menu

The Great American Coffee Taste Test Dataset

Context

Content

Variables

Acknowledgement

United States Consumer Spending

Replication Data and Code for: Why ‘Buy American’ is a bad idea but...

Consumer Expenditure Survey (CE)

Gambling Data America

Coffee Taste Test

The Great American Coffee Taste Test

Data Dictionary

coffee_survey.csv

American English Call Center Data for BFSI AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

US Retail Sales

Data Broker Service Market Report | Global Forecast From 2025 To 2033

Data Broker Service Market Outlook

Data Type Analysis

United States Michigan Consumer Sentiment

National Research Bureau, Estimated Shopping Mall Sales By State, USA, 2005

Location Data | South America | Real-Time API Polygon-Based GPS Stream

Factori Audience | 1.2B unique mobile users in APAC, EU, North America and...

American English Call Center Data for Retail & E-Commerce AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

United States Total Light Vehicle Sales

United States Imports

United States Existing Home Sales

Database Monitoring Tool Market Report | Global Forecast From 2025 To 2033

Database Monitoring Tool Market Outlook

Component Analysis

United States New Home Sales

The Great American Coffee Taste Test Dataset

James Hoffmann and Cometeer survey America's coffee taste preferences

Context

Content

Variables

Acknowledgement

`coffee_survey.csv`