24 datasets found
  1. The Great American Coffee Taste Test Dataset

    • kaggle.com
    Updated May 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umer Haddii (2024). The Great American Coffee Taste Test Dataset [Dataset]. https://www.kaggle.com/datasets/umerhaddii/the-great-american-coffee-taste-test-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2024
    Dataset provided by
    Kaggle
    Authors
    Umer Haddii
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    World champion barista James Hoffmann and Cometeer partnered to conduct a first-of-its-kind coffee taste test. Cometeer shipped 5000 coffee kits across America. Kits contained four different coffees - pre-extracted and flash frozen. Tasters melted and diluted the coffee capsules for a largely identical tasting experience. Tasting and ratings were conducted blind [1]. After survey responses were collected (provided data), some attributes of the coffee were revealed.

    In October 2023, World champion barista James Hoffmann and coffee company Cometeer held the "Great American Coffee Taste Test" on YouTube, during which viewers were asked to fill out a survey about 4 coffees they ordered from Cometeer for the tasting. Data blogger Robert McKeon Aloe analyzed the data the following month.

    Content

    Geography: US

    Time-period: 2023

    Unit of Analysis: The Great American Coffee Taste Test

    Variables

    • submission_id = Submission ID
    • age = What is your age?
    • cups = How many cups of coffee do you typically drink per day?
    • where_drink = Where do you typically drink coffee?
    • brew = How do you brew coffee at home?
    • brew_other = How else do you brew coffee at home?
    • purchase = On the go, where do you typically purchase coffee?
    • purchase_other = Where else do you purchase coffee?
    • favorite = What is your favorite coffee drink?
    • favorite_specify = Please specify what your favorite coffee drink is
    • additions = Do you usually add anything to your coffee?
    • additions_other = What else do you add to your coffee?
    • dairy = What kind of dairy do you add?
    • sweetener = What kind of sugar or sweetener do you add?
    • style = Before today's tasting, which of the following best described what kind of coffee you like?
      -**strength** = How strong do you like your coffee?
    • roast_level = What roast level of coffee do you prefer?
    • caffeine = How much caffeine do you like in your coffee?
    • expertise = Lastly, how would you rate your own coffee expertise?
    • coffee_a_bitterness = Coffee A - Bitterness
    • coffee_a_acidity = Coffee A - Acidity
    • coffee_a_personal_preference = Coffee A - Personal Preference
    • coffee_a_notes = Coffee A - Notes
    • coffee_b_bitterness = Coffee B - Bitterness
    • coffee_b_acidity = Coffee B - Acidity
    • coffee_b_personal_preference = Coffee B - Personal Preference
    • coffee_b_notes = Coffee B - Notes
    • coffee_c_bitterness = Coffee C - Bitterness
    • coffee_c_acidity = Coffee C - Acidity
    • coffee_c_personal_preference = Coffee C - Personal Preference
    • coffee_c_notes = Coffee C - Notes
    • coffee_d_bitterness = Coffee D - Bitterness
    • coffee_d_acidity = Coffee D - Acidity
    • coffee_d_personal_preference = Coffee D - Personal Preference
    • coffee_d_notes = Coffee D - Notes
    • prefer_abc = Between Coffee A, Coffee B, and Coffee C which did you prefer?
    • prefer_ad = Between Coffee A and Coffee D, which did you prefer?
    • prefer_overall = Lastly, what was your favorite overall coffee?
    • wfh = Do you work from home or in person?
    • total_spend = In total, how much money do you typically spend on coffee in a month?
    • why_drink = Why do you drink coffee?
    • why_drink_other = Other reason for drinking coffee
    • taste = Do you like the taste of coffee?
    • know_source = Do you know where your coffee comes from?
    • most_paid = What is the most you've ever paid for a cup of coffee?
    • most_willing = What is the most you'd ever be willing to pay for a cup of coffee?
    • value_cafe = Do you feel like you’re getting good value for your money when you buy coffee at a cafe?
    • spent_equipment = Approximately how much have you spent on coffee equipment in the past 5 years?
    • value_equipment = Do you feel like you’re getting good value for your money when you buy coffee at a cafe?
    • gender = Gender
    • gender_specify = Gender (please specify)
    • education_level = Education Level
    • ethnicity_race = Ethnicity/Race
    • ethnicity_race_specify = Ethnicity/Race (please specify)
    • employment_status = Employment Status
    • number_children = Number of Children
    • political_affiliation = Political Affiliation

    Acknowledgement

    Datasource: The data is collected thorugh a survey called The Great American Coffee Taste Test held by James Haffmann

    Inspiration: [Great American Coffee...

  2. Coffee consumption in the U.S. 2013/14-2024/2025

    • statista.com
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Coffee consumption in the U.S. 2013/14-2024/2025 [Dataset]. https://www.statista.com/statistics/804271/domestic-coffee-consumption-in-the-us/
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    North America, United States
    Description

    Coffee consumption in the United States amounted to over ** million 60-kilogram bags in the 2024/2025 fiscal year. This is a slight increase from the total U.S. coffee consumption in the previous fiscal year. Coffee production The coffee plant has its origins in Ethiopia and is now grown all over the world. Most of the world’s coffee is cultivated in South America, followed by Asia and Oceania. In 2024, over ***million 60-kilogram bags of coffee were produced in South America. The majority of South America’s coffee production is attributed to Brazil. In the 2023/2024 fiscal year, global coffee production reached *** million 60-kilogram bags. Coffee brewing in the United States Americans love their coffee and have dozens of different methods and gadgets for brewing and preparing coffee. A 2025 survey of U.S. consumers found that the most commonly used coffee preparation methods were drip coffee makers and single-cup brewers. However, drip coffee makers have become less popular over time. In 2012, ***percent of coffee drinkers used drip coffee makers, while in 2025 this share had dropped to ***percent.

  3. Coffee Shop Daily Revenue Prediction Dataset

    • kaggle.com
    zip
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himel Sarder (2025). Coffee Shop Daily Revenue Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/himelsarder/coffee-shop-daily-revenue-prediction-dataset
    Explore at:
    zip(30259 bytes)Available download formats
    Dataset updated
    Feb 7, 2025
    Authors
    Himel Sarder
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Overview

    This dataset contains 2,000 rows of data from coffee shops, offering detailed insights into factors that influence daily revenue. It includes key operational and environmental variables that provide a comprehensive view of how business activities and external conditions affect sales performance. Designed for use in predictive analytics and business optimization, this dataset is a valuable resource for anyone looking to understand the relationship between customer behavior, operational decisions, and revenue generation in the food and beverage industry.

    Columns & Variables

    The dataset features a variety of columns that capture the operational details of coffee shops, including customer activity, store operations, and external factors such as marketing spend and location foot traffic.

    1. Number of Customers Per Day

      • The total number of customers visiting the coffee shop on any given day.
      • Range: 50 - 500 customers.
    2. Average Order Value ($)

      • The average dollar amount spent by each customer during their visit.
      • Range: $2.50 - $10.00.
    3. Operating Hours Per Day

      • The total number of hours the coffee shop is open for business each day.
      • Range: 6 - 18 hours.
    4. Number of Employees

      • The number of employees working on a given day. This can influence service speed, customer satisfaction, and ultimately, sales.
      • Range: 2 - 15 employees.
    5. Marketing Spend Per Day ($)

      • The amount of money spent on marketing campaigns or promotions on any given day.
      • Range: $10 - $500 per day.
    6. Location Foot Traffic (people/hour)

      • The number of people passing by the coffee shop per hour, a variable indicative of the shop's location and its potential to attract customers.
      • Range: 50 - 1000 people per hour.

    Target Variable

    • Daily Revenue ($)
      • This is the dependent variable representing the total revenue generated by the coffee shop each day.
      • It is calculated as a combination of customer visits, average spending, and other operational factors like marketing spend and staff availability.
      • Range: $200 - $10,000 per day.

    Data Distribution & Insights

    The dataset spans a wide variety of operational scenarios, from small neighborhood coffee shops with limited traffic to larger, high-traffic locations with extensive marketing budgets. This variety allows for exploring different predictive modeling strategies. Key insights that can be derived from the data include:

    • The effect of marketing spend on daily revenue.
    • The correlation between customer count and daily sales.
    • The relationship between staffing levels and revenue generation.
    • The influence of foot traffic and operating hours on customer behavior.

    Use Cases & Applications

    The dataset offers a wide range of applications, especially in predictive analytics, business optimization, and forecasting:

    • Predictive Modeling: Use machine learning models such as regression, decision trees, or neural networks to predict daily revenue based on operational data.
    • Business Strategy Development: Analyze how changes in marketing spend, staff numbers, or operating hours can optimize revenue and improve efficiency.
    • Customer Insights: Identify patterns in customer behavior related to shop operations and external factors like foot traffic and marketing campaigns.
    • Resource Allocation: Determine optimal staffing levels and marketing budgets based on predicted sales, improving overall profitability.

    Real-World Applications in the Food & Beverage Industry

    For coffee shop owners, managers, and analysts in the food and beverage industry, this dataset provides an essential tool for refining daily operations and boosting profitability. Insights gained from this data can help:

    • Optimize Marketing Campaigns: Evaluate the effectiveness of daily or seasonal marketing campaigns on revenue.
    • Staff Scheduling: Predict busy days and ensure that the right number of employees are scheduled to maximize efficiency.
    • Revenue Forecasting: Provide accurate revenue projections that can assist with financial planning and decision-making.
    • Operational Efficiency: Discover the most profitable operating hours and adjust business hours accordingly.

    This dataset is also ideal for aspiring data scientists and machine learning practitioners looking to apply their skills to real-world business problems in the food and beverage sector.

    Conclusion

    The Coffee Shop Revenue Prediction Dataset is a versatile and comprehensive resource for understanding the dynamics of daily sales performance in coffee shops. With a focus on key operational factors, it is perfect for building predictive models, ...

  4. Coffee Tastings [Survey Analysis]

    • kaggle.com
    zip
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujay Kapadnis (2023). Coffee Tastings [Survey Analysis] [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/lets-do-some-coffee-tasting
    Explore at:
    zip(444226 bytes)Available download formats
    Dataset updated
    Nov 20, 2023
    Authors
    Sujay Kapadnis
    Description

    Last month, British YouTuber (and former World Barista Champion) James Hoffman virtually hosted the Great American Coffee Taste Test, during which thousands of people simultaneously blind-tasted the same four coffees. Hoffman has published a video summarizing the results, as well as a spreadsheet of anonymized survey responses from 4,000+ participants. It includes tasters’ demographics, general coffee drinking habits and preferences, assessments of the four coffees, and more CR: https://bit.ly/gacttCSV+

  5. Coffee Taste Test

    • kaggle.com
    zip
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2024). Coffee Taste Test [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/coffee-taste-test
    Explore at:
    zip(401958 bytes)Available download formats
    Dataset updated
    Jun 12, 2024
    Authors
    Joakim Arvidsson
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Great American Coffee Taste Test

    In October 2023, "world champion barista" James Hoffmann and coffee company Cometeer held the "Great American Coffee Taste Test" on YouTube, during which viewers were asked to fill out a survey about 4 coffees they ordered from Cometeer for the tasting. Data blogger Robert McKeon Aloe analyzed the data the following month.

    Do you think participants in this survey are representative of Americans in general?

    Data Dictionary

    coffee_survey.csv

    variableclassdescription
    submission_idcharacterSubmission ID
    agecharacterWhat is your age?
    cupscharacterHow many cups of coffee do you typically drink per day?
    where_drinkcharacterWhere do you typically drink coffee?
    brewcharacterHow do you brew coffee at home?
    brew_othercharacterHow else do you brew coffee at home?
    purchasecharacterOn the go, where do you typically purchase coffee?
    purchase_othercharacterWhere else do you purchase coffee?
    favoritecharacterWhat is your favorite coffee drink?
    favorite_specifycharacterPlease specify what your favorite coffee drink is
    additionscharacterDo you usually add anything to your coffee?
    additions_othercharacterWhat else do you add to your coffee?
    dairycharacterWhat kind of dairy do you add?
    sweetenercharacterWhat kind of sugar or sweetener do you add?
    stylecharacterBefore today's tasting, which of the following best described what kind of coffee you like?
    strengthcharacterHow strong do you like your coffee?
    roast_levelcharacterWhat roast level of coffee do you prefer?
    caffeinecharacterHow much caffeine do you like in your coffee?
    expertisenumericLastly, how would you rate your own coffee expertise?
    coffee_a_bitternessnumericCoffee A - Bitterness
    coffee_a_aciditynumericCoffee A - Acidity
    coffee_a_personal_preferencenumericCoffee A - Personal Preference
    coffee_a_notescharacterCoffee A - Notes
    coffee_b_bitternessnumericCoffee B - Bitterness
    coffee_b_aciditynumericCoffee B - Acidity
    coffee_b_personal_preferencenumericCoffee B - Personal Preference
    coffee_b_notescharacterCoffee B - Notes
    coffee_c_bitternessnumericCoffee C - Bitterness
    coffee_c_aciditynumericCoffee C - Acidity
    coffee_c_personal_preferencenumericCoffee C - Personal Preference
    coffee_c_notescharacterCoffee C - Notes
    coffee_d_bitternessnumericCoffee D - Bitterness
    coffee_d_aciditynumericCoffee D - Acidity
    coffee_d_personal_preferencenumericCoffee D - Personal Preference
    coffee_d_notescharacterCoffee D - Notes
    prefer_abccharacterBetween Coffee A, Coffee B, and Coffee C which did you prefer?
    prefer_adcharacterBetween Coffee A and Coffee D, which did you prefer?
    prefer_overallcharacterLastly, what was your favorite overall coffee?
    wfhcharacterDo you work from home or in person?
    total_spendcharacterIn total, much money do you typically spend on coffee in a month?
    why_drinkcharacterWhy do you drink coffee?
    why_drink_othercharacterOther reason for drinking coffee
    tastecharacterDo you like the taste of coffee?
    know_sourcecharacterDo you know where your coffee comes from?
    most_paidcharacterWhat is the most you've ever paid for a cup of coffee?
    most_willingcharacterWhat is the most you'd ever be willing to pay for a cup of coffee?
    value_cafecharacterDo you feel like you’re getting good value for your money when you buy coffee at a cafe?
    spent_equipmentcharacterApproximately how much have you spent on coffee equipment in the past 5 years?
    value_equipmentcharacterDo you feel like you’re getting good value for your mo...
  6. Data_Sheet_1_Coffee consumption decreases the connectivity of the posterior...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Picó-Pérez; Ricardo Magalhães; Madalena Esteves; Rita Vieira; Teresa C. Castanho; Liliana Amorim; Mafalda Sousa; Ana Coelho; Pedro S. Moreira; Rodrigo A. Cunha; Nuno Sousa (2023). Data_Sheet_1_Coffee consumption decreases the connectivity of the posterior Default Mode Network (DMN) at rest.docx [Dataset]. http://doi.org/10.3389/fnbeh.2023.1176382.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 28, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Maria Picó-Pérez; Ricardo Magalhães; Madalena Esteves; Rita Vieira; Teresa C. Castanho; Liliana Amorim; Mafalda Sousa; Ana Coelho; Pedro S. Moreira; Rodrigo A. Cunha; Nuno Sousa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Habitual coffee consumers justify their life choices by arguing that they become more alert and increase motor and cognitive performance and efficiency; however, these subjective impressions still do not have a neurobiological correlation. Using functional connectivity approaches to study resting-state fMRI data in a group of habitual coffee drinkers, we herein show that coffee consumption decreased connectivity of the posterior default mode network (DMN) and between the somatosensory/motor networks and the prefrontal cortex, while the connectivity in nodes of the higher visual and the right executive control network (RECN) is increased after drinking coffee; data also show that caffeine intake only replicated the impact of coffee on the posterior DMN, thus disentangling the neurochemical effects of caffeine from the experience of having a coffee.

  7. Association between coffee drinking and telomere length in the Prostate,...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bella Steiner; Leah M. Ferrucci; Lisa Mirabello; Qing Lan; Wei Hu; Linda M. Liao; Sharon A. Savage; Immaculata De Vivo; Richard B. Hayes; Preetha Rajaraman; Wen-Yi Huang; Neal D. Freedman; Erikka Loftfield (2023). Association between coffee drinking and telomere length in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial [Dataset]. http://doi.org/10.1371/journal.pone.0226972
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Bella Steiner; Leah M. Ferrucci; Lisa Mirabello; Qing Lan; Wei Hu; Linda M. Liao; Sharon A. Savage; Immaculata De Vivo; Richard B. Hayes; Preetha Rajaraman; Wen-Yi Huang; Neal D. Freedman; Erikka Loftfield
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mounting evidence indicates that coffee, a commonly consumed beverage worldwide, is inversely associated with various chronic diseases and overall mortality. Few studies have evaluated the effect of coffee drinking on telomere length, a biomarker of chromosomal integrity, and results have been inconsistent. Understanding this association may provide mechanistic insight into associations of coffee with health. The aim of our study was to test the hypothesis that heavier coffee intake is associated with greater likelihood of having above-median telomere length. We evaluated the cross-sectional association between coffee intake and relative telomere length using data from 1,638 controls from four previously conducted case-control studies nested in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Coffee intake was assessed using a food frequency questionnaire, and relative telomere length was measured from buffy-coat, blood, or buccal cells. We used unconditional logistic regression models to generate multivariable-adjusted, study-specific odds ratios for the association between coffee intake and relative telomere length. We then conducted a random-effects meta-analysis to determine summary odds ratios. We found that neither summary continuous (OR = 1.01, 95% CI = 0.99–1.03) nor categorical (OR

  8. Coffee Sales Dataset

    • kaggle.com
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anas Sarfraz (2025). Coffee Sales Dataset [Dataset]. https://www.kaggle.com/datasets/anassarfraz13/coffee-sales-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 23, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anas Sarfraz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains around 3,530 coffee sales transactions recorded in a cafe. It gives info about both customer and transaction details like time of purchase, method of payment, beverage type, and amount spent.

    Key Features:

    hour of the day : The hour when the purchase occurred

    cash type : Payment method used

    money : The amount of money spent on the purchase

    coffee name : The type of coffee or drink purchased

    Time of Day : Morning, Afternoon, Evening, etc

    Weekday : The day of the week on which the transaction occurred

    Month name : month when the transaction was recorded

    sorting of day or month : weekdays and months sorting

    Date : date of the transaction

    Time: Exact time of purchase

  9. DataSheet1_The causal association between smoking initiation, alcohol and...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhaoying Jiang; Renke He; Haiyan Wu; Jiaen Yu; Kejing Zhu; Qinyu Luo; Xueying Liu; Jiexue Pan; Hefeng Huang (2023). DataSheet1_The causal association between smoking initiation, alcohol and coffee consumption, and women’s reproductive health: A two-sample Mendelian randomization analysis.ZIP [Dataset]. http://doi.org/10.3389/fgene.2023.1098616.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Zhaoying Jiang; Renke He; Haiyan Wu; Jiaen Yu; Kejing Zhu; Qinyu Luo; Xueying Liu; Jiexue Pan; Hefeng Huang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Objective: A number of epidemiological studies have demonstrated that smoking initiation and alcohol and coffee consumption were closely related to women’s reproductive health. However, there was still insufficient evidence supporting their direct causality effect.Methods: We utilized two-sample Mendelian randomization (TSMR) analysis with summary datasets from genome-wide association study (GWAS) to investigate the causal relationship between smoking initiation, alcohol and coffee consumption, and women’s reproductive health-related traits. Exposure genetic instruments were used as variants significantly related to traits. The inverse-variance weighted (IVW) method was used as the main analysis approach, and we also performed MR-PRESSO, MR-Egger, weighted median, and weighted mode to supplement the sensitivity test. Then, the horizontal pleiotropy was detected by using MRE intercept and MR-PRESSO methods, and the heterogeneity was assessed using Cochran’s Q statistics.Results: We found evidence that smoking women showed a significant inverse causal association with the sex hormone-binding globulin (SHBG) levels (corrected β = −0.033, p = 9.05E-06) and age at menopause (corrected β = −0.477, p = 6.60E-09) and a potential positive correlation with the total testosterone (TT) levels (corrected β = 0.033, p = 1.01E-02). In addition, there was suggestive evidence for the alcohol drinking effect on the elevated TT levels (corrected β = 0.117, p = 5.93E-03) and earlier age at menopause (corrected β = −0.502, p = 4.14E-02) among women, while coffee consumption might decrease the female SHBG levels (corrected β = −0.034, p = 1.33E-03).Conclusion: Our findings suggested that smoking in women significantly decreased their SHBG concentration, promoted earlier menopause, and possibly reduced the TT levels. Alcohol drinking had a potential effect on female higher TT levels and earlier menopause, while coffee consumption might lead to lower female SHBG levels.

  10. R

    Food_new Dataset

    • universe.roboflow.com
    zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allergen30 (2024). Food_new Dataset [Dataset]. https://universe.roboflow.com/allergen30/food_new-uuulf/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    Allergen30
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Food Bounding Boxes
    Description

    Allergen30

    About Allergen30

    Allergen30 is created by Mayank Mishra, Nikunj Bansal, Tanmay Sarkar and Tanupriya Choudhury with a goal of building a robust detection model that can assist people in avoiding possible allergic reactions.

    It contains more than 6,000 images of 30 commonly used food items which can cause an adverse reaction within a human body. This dataset is one of the first research attempts in training a deep learning based computer vision model to detect the presence of such food items from images. It also serves as a benchmark for evaluating the efficacy of object detection methods in learning the otherwise difficult visual cues related to food items.

    Description of class labels

    There are multiple food items pertaining to specific food intolerances which can trigger an allergic reaction. Such food intolerance primarily include Lactose, Histamine, Gluten, Salicylate, Caffeine and Ovomucoid intolerance. https://github.com/mmayank74567/mmayank74567.github.io/blob/master/images/FoodIntol.png?raw=true" alt="Food intolerance">

    The following table contains the description relating to the 30 class labels in our dataset.

    S. No.AllergenFood labelDescription
    1OvomucoideggImages of egg with yolk (e.g. sunny side up eggs)
    2Ovomucoidwhole_egg_boiledImages of soft and hard boiled eggs
    3Lactose/HistaminemilkImages of milk in a glass
    4LactoseicecreamImages of icecream scoops
    5LactosecheeseImages of swiss cheese
    6Lactose/ Caffeinemilk_based_beverageImages of tea/ coffee with milk in a cup/glass
    7Lactose/CaffeinechocolateImages of chocolate bars
    8Caffeinenon_milk_based_beverageImages of soft drinks and tea/coffee without milk in a cup/glass
    9Histaminecooked_meatImages of cooked meat
    10Histamineraw_meatImages of raw meat
    11HistaminealcoholImages of alcohol bottles
    12Histaminealcohol_glassImages of wine glasses with alcohol
    13HistaminespinachImages of spinach bundle
    14HistamineavocadoImages of avocado sliced in half
    15HistamineeggplantImages of eggplant
    16SalicylateblueberryImages of blueberry
    17SalicylateblackberryImages of blackberry
    18SalicylatestrawberryImages of strawberry
    19SalicylatepineappleImages of pineapple
    20SalicylatecapsicumImages of bell pepper
    21SalicylatemushroomImages of mushrooms
    22SalicylatedatesImages of dates
    23SalicylatealmondsImages of almonds
    24SalicylatepistachiosImages of pistachios
    25SalicylatetomatoImages of tomato and tomato slices
    26GlutenrotiImages of roti
    27GlutenpastaImages of one serving of penne pasta
    28GlutenbreadImages of bread slices
    29Glutenbread_loafImages of bread loaf
    30GlutenpizzaImages of pizza and pizza slices

    Data collection

    We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

    Fair use

    This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

    **Citatio

  11. The ORBIT (Object Recognition for Blind Image Training)-India Dataset

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones (2025). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. http://doi.org/10.5281/zenodo.12608444
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

    Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

    The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

    This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

    REFERENCES:

    1. Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

    2. microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

    3. Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641

  12. d

    Data from: USDA National Nutrient Database for Standard Reference Dataset...

    • catalog.data.gov
    • datasetcatalog.nlm.nih.gov
    • +3more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES (Survey-SR) [Dataset]. https://catalog.data.gov/dataset/usda-national-nutrient-database-for-standard-reference-dataset-for-what-we-eat-in-america--37895
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Area covered
    United States
    Description

    The dataset, Survey-SR, provides the nutrient data for assessing dietary intakes from the national survey What We Eat In America, National Health and Nutrition Examination Survey (WWEIA, NHANES). Historically, USDA databases have been used for national nutrition monitoring (1). Currently, the Food and Nutrient Database for Dietary Studies (FNDDS) (2), is used by Food Surveys Research Group, ARS, to process dietary intake data from WWEIA, NHANES. Nutrient values for FNDDS are based on Survey-SR. Survey-SR was referred to as the "Primary Data Set" in older publications. Early versions of the dataset were composed mainly of commodity-type items such as wheat flour, sugar, milk, etc. However, with increased consumption of commercial processed and restaurant foods and changes in how national nutrition monitoring data are used (1), many commercial processed and restaurant items have been added to Survey-SR. The current version, Survey-SR 2013-2014, is mainly based on the USDA National Nutrient Database for Standard Reference (SR) 28 (2) and contains sixty-six nutrientseach for 3,404 foods. These nutrient data will be used for assessing intake data from WWEIA, NHANES 2013-2014. Nutrient profiles were added for 265 new foods and updated for about 500 foods from the version used for the previous survey (WWEIA, NHANES 2011-12). New foods added include mainly commercially processed foods such as several gluten-free products, milk substitutes, sauces and condiments such as sriracha, pesto and wasabi, Greek yogurt, breakfast cereals, low-sodium meat products, whole grain pastas and baked products, and several beverages including bottled tea and coffee, coconut water, malt beverages, hard cider, fruit-flavored drinks, fortified fruit juices and fruit and/or vegetable smoothies. Several school lunch pizzas and chicken products, fast-food sandwiches, and new beef cuts were also added, as they are now reported more frequently by survey respondents. Nutrient profiles were updated for several commonly consumed foods such as cheddar, mozzarella and American cheese, ground beef, butter, and catsup. The changes in nutrient values may be due to reformulations in products, changes in the market shares of brands, or more accurate data. Examples of more accurate data include analytical data, market share data, and data from a nationally representative sample. Resources in this dataset:Resource Title: USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES 2013-14 (Survey SR 2013-14). File Name: SurveySR_2013_14 (1).zipResource Description: Access database downloaded on November 16, 2017. US Department of Agriculture, Agricultural Research Service, Nutrient Data Laboratory. USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES (Survey-SR), October 2015. Resource Title: Data Dictionary. File Name: SurveySR_DD.pdf

  13. Office Hydration Monitoring (OHM) Dataset

    • zenodo.org
    • research.science.eus
    • +2more
    zip
    Updated May 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oihane Gómez-Carmona; Oihane Gómez-Carmona; Diego Casado-Mansilla; Diego Casado-Mansilla (2021). Office Hydration Monitoring (OHM) Dataset [Dataset]. http://doi.org/10.5281/zenodo.4681206
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 10, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Oihane Gómez-Carmona; Oihane Gómez-Carmona; Diego Casado-Mansilla; Diego Casado-Mansilla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a public collection of labelled data for classifying office employees' hydration patterns (e.g., drink water, tea or coffee) using a wearable sensor placed on liquid containers.

    It contains 1000 recorded sequences of time series data performed by 10 different subjects. These instances include 25 variations of different interactions that could be made with liquid containers.
    Each of the 25 variations was repeated 4 times for each volunteer (6 male, 4 female, all right-handed).

    Those interactions are grouped into three main classes:

    (1) drinking from a bottle (240 instances);
    (2) drinking from a glass/cup (240 instances) ;
    (3) other kinds of interactions (e.g., inspect or shake the glass/cup or the bottle) (520 instances).


    This dataset was created with the idea of having a semi-controlled activity dataset that resembles real-world scenarios. Therefore, the interaction to be recorded was intentionally described very vaguely to the volunteer and no detailed instructions were given to guide their movements. Moreover, each of them had its own liquid containers.

    Data was captured with a MPU6886 6-axis IMU sensor, with 3-axis gravity accelerometer and 3-axis gyroscope and each txt file contains one recorded trial and includes the acceleration (m/s^2), rotation speed (rad/s), and rotation angles for X, Y and Z.

    With respect to the glass, mug or bottle, the placement of the sensor was not fixed. Only the component of the signal perpendicular to the plane (y) pointed in the same direction in every case (i.e., volunteers could rotate the water container with the sensor attached, and the initial orientation was not fixed). Thus, this induces a high variance in the recorded data, as the reference system for the accelerometer and gyroscope signals can vary.

    A post-processing stage was carried out to filter the signal and remove the stationary state of the recording (i.e., when the container is on the table)

    We gratefully acknowledge the support of the Basque Government's Department of Education for the predoctoral funding of one of the authors and the Deustek Research Group.

  14. Statistical analysis and dataset for: Acute exposure to caffeine improves...

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Statistical analysis and dataset for: Acute exposure to caffeine improves foraging in an invasive ant [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8413980?locale=bg
    Explore at:
    unknown(8990)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Linked to the journal article published in bioRxiv (https://doi.org/10.1101/2023.10.10.561519). Abstract Invasive alien species are a major and growing problem, devastating ecosystems and costing billions of euros in damage and control efforts. Argentine ants, Linepithema humile, are particularly concerning, with control efforts often falling short likely due to a lack of sufficient bait consumption. Using neuroactives to manipulate ant navigation and learning could increase recruitment and consumption, ultimately leading to more efficient control strategies. Caffeine is naturally occurring, cheap, and has been found to cause motivational and cognitive improvements in bees. Here, we subject L. humile to a wide range of caffeine concentrations and a complex but ecologically relevant task: an open landscape foraging experiment. Without caffeine, we find no effect of consecutive foraging visits on the time the ants take to reach a reward, suggesting a failure to learn the reward's location. However, low (25ppm) to intermediate (250ppm) concentrations of caffeine lead to a decrease of up to 38% in the time taken to find the reward during each consecutive visit, implying that caffeine boosts learning. Interestingly, such improvements are lost at high (2000ppm) doses. In contrast, caffeine appears to have no impact on the ants' homing behaviour, as the time required to reach the nest was similar across treatments. The effect of caffeine is thus not only dose-dependent, but also differentially targets neurologically distinct navigational mechanisms. Adding moderate levels of caffeine to baits could be a simple way to improve ant's ability to learn its location, potentially leading to increased recruitment to, and consumption of, the toxicant. sample_videos.zip: A subset of the videos used for data extraction. The complete collection of videos is not publicly accessible primarily due to their considerable size (105.35GB). Requests for access to the entire video set are encouraged. OpLan_D1_metadata.csv: Manually collected metadata pertaining to experimental conditions, subjects, and treatments. OpLan_D2_DLC_coordinates.zip: Cartesian coordinates obtained from DeepLabCut for each of the videos analysed. OpLan_C1_reproject_coordinates.py: Python code used to standardise the ants' coordinates by ensuring the same corner of the A4 platform was used as the origin of the cartesian referential of all videos. The known dimensions of the A4 were further used to convert coordinates from pixels to millimetres. OpLan_C2_remove_impossibilities.py: Python code used to account for DeepLabCut tracking errors, with any ant movement exceeding two millimetres per frame being considered implausible and subsequently removed. OpLan_C3_find_changepoints.py: Python code used to automatically derive the times at which an ant reached and left the reward from the tracking data. OpLan_C4_inward_outward_data.py: Python code used to calculate relevant measures for the foodward (inward) and nestward (outward) journey such as journey duration, mean instantaneous speed and path tortuosity. OpLan_C5_Figure_2.R: R code used to produce the raw elements of Figure 2. OpLan_C6_Figure_4.R: R code used to produce the raw elements of Figure 4. OpLan_C7_Statistical_Analysis.html: Complete statistical analysis and code for the manuscript.

  15. U.S. Food Imports

    • catalog.data.gov
    • data.globalchange.gov
    • +4more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economic Research Service, Department of Agriculture (2025). U.S. Food Imports [Dataset]. https://catalog.data.gov/dataset/u-s-food-imports
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Economic Research Servicehttp://www.ers.usda.gov/
    Area covered
    United States
    Description

    U.S. consumers demand variety, quality, and convenience in the foods they consume. As Americans have become wealthier and more ethnically diverse, the American food basket reflects a growing share of tropical products, spices, and imported gourmet products. Seasonal and climatic factors drive U.S. imports of popular types of fruits and vegetables and tropical products, such as cocoa and coffee. In addition, a growing share of U.S. imports can be attributed to intra-industry trade, whereby agricultural-processing industries based in the United States carry out certain processing steps offshore and import products at different levels of processing from their subsidiaries in foreign markets. This data set provides import values of edible products (food and beverages) entering U.S. ports and their origin of shipment. Data are from the U.S. Department of Commerce, U.S. Census Bureau. Food and beverage import values are compiled by calendar year into food groups corresponding to major commodities or level of processing. At least 10 years of annual data are included, enabling users to track long-term growth patterns.

  16. World Coffee Exports: 1990-2019

    • kaggle.com
    Updated May 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Random Draw (2023). World Coffee Exports: 1990-2019 [Dataset]. https://www.kaggle.com/datasets/danbraswell/world-coffee-exports-19902019
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Random Draw
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    This data set contains world wide coffee exports from all countries for the years 1990 thru 2019. It was developed using the Excel file "Exports - calendar year" at https://www.ico.org/new_historical.asp . The units are those used in the coffee industry: "thousands of 60 kg bags".

  17. n

    Data from: Acids in Coffee - A Review of Sensory Measurements and...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jul 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Yeager (2021). Acids in Coffee - A Review of Sensory Measurements and Meta-Analysis of Chemical Composition [Dataset]. http://doi.org/10.25338/B8C91C
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2021
    Dataset provided by
    University of California, Davis
    Authors
    Sara Yeager
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    This dataset contains information from the meta-analysis presented in "Acids in Coffee: A Review of Sensory Measurements and Meta-Analysis of Chemical Composition." Acid concentrations were extracted from a total of 121 publications for at least one of 26 different organic acids (OAs) or 23 different chlorogenic acids (CGAs), yielding 7,509 distinct data points. Concentrations were collected for Coffea arabica, Coffea canpehora (robusta), and other types of coffee, for both green and different roast levels.

    Methods To obtain a more complete picture on the acid composition in coffee, we conducted an extensive review and meta-analysis of the scientific literature. Web of Science, Google Scholar, and the University of California Library catalog were searched between April to December 2020 for any publications that included data about the amounts of acid in coffee samples. This search focused explicitly on measurements of the concentration of individual CGAs and OAs in coffee, not the overall amount of acid in coffee (usually expressed as total titratable acidity). Access was limited to online versions of publications due to COVID-19 restrictions during the time of the database search. Articles not available directly online were obtained through Interlibrary Loan requests. In the case of articles published in languages other than English, a translating website was used to read the article. Abstracts and full texts were examined for specific data about the absolute amounts of any chlorogenic or organic acids. Articles that only examined the presence, relative amounts, or formation pathways of CGAs or OAs were excluded. Papers that reported CGAs or OAs in units of mg/L without including the original mass of coffee used were excluded due to the fact that amounts in units of mg/L cannot be directly compared with amounts in units of mg/kg (comparing mass in wet basis versus mass in dry basis).

    If the publication did contain specific amounts of CGAs or OAs that satisfied the preceding conditions, then all roast levels, extraction types, and coffee species were included, except for decaffeinated and instant coffee. The additional processing on decaffeinated and instant coffee complicates comparison with other coffees. If the publication listed data for store-bought samples, those were included as well. In some cases, roast level and coffee species were not specified, and these data points were categorized as “unspecified”. For the purposes of this review, Coffea arabica will be referred to as “arabica” coffee and Coffea canephora cv. robusta will be referred to as “robusta” coffee.
    A tremendous complicating factor is the roast level, which strongly affects acid concentrations but is very challenging to quantify precisely; subjective roast descriptions like “dark roast” have no universally accepted definition. For the purpose of the meta-analysis, we therefore performed a semi-qualitative classification of the reported roast levels into three categories – light, medium, or dark – using the following methodology.

    The roast levels for specific data in publications was determined in one of four ways: (1) as the publication’s self-described roast level; (2) from the publication’s reported amount of water lost during roasting (11-13% = light, 14-16% = medium, 17-20% dark) or organic roast loss percentage (ORL%) (2-4% = light, 4.1-5.5% = medium, 5.6-7% = dark) (Perrone et al. 2008; Weers et al. 1995); (3) the publication’s reported L*a*b* color values of the roasted beans where L* of 30, 25, and 20 correspond to light, medium, and dark, respectively (Chindapan, Soydok, and Devahastin 2019); or (4) as “unspecified” if the publication did not mention any of the above. If the publication provided finer demarcations of roast level (e.g., a “light roast” and a “very light” roast), then we grouped their samples as appropriate into just our three broad categories. Lastly, samples that were labelled simply as “roasted” without giving any indication to the degree of roast kept the label of “roasted” and were included when comparing roasted coffee as a whole (Correia, Leitao and Clifford 1995; Agnoletti et al. 2019). We emphasize that because roast level is very qualitative and methods of measuring roast level vary greatly, the roast level labels used in this paper are approximate, based on the information available in the cited publications.

    Similarly, extraction of the acids for analysis varied widely among the different publications. If a chemical solvent such as methanol was used, the extraction type was labelled as “solvent”; soaking the coffee grounds in hot water was labelled as “immersion”; extraction types such as “French press” or “espresso” were explicitly mentioned in their respective publications and the labels were kept for data collection.

    Lastly, all measurements were converted to mg/kg to simplify comparison. Accordingly, the units reported in the publications often had to be converted, e.g., data reported in units of g/kg was multiplied by 1000 to match units of mg/kg. In cases, where publications reported concentration in terms of mmol/kg, the molecular weight of the specific acid was used to convert to mg/kg. Lastly, in articles that presented the data in units of mg/L and included the original brew recipe (grams of coffee and liters of water), the data was converted to units of mg/kg using the brew recipe, assuming full extraction from the dry coffee grounds. Data for 23 different CGAs and 26 different OAs was collected and analyzed While thirty-eight OAs have been quantified in coffee (Maier 1999), many are present in trace amounts and not commonly reported. Those reported in fewer than 2 publications and with amounts less than 0.01/kg were not included, accounting for the difference in total OAs analyzed in this review.

    In chlorogenic acids the widely reported acids are total CQA, 5-CQA, 4-CQA, 3-CQA, total diCQA, and total FQA. Some publications would report only total concentrations of one class (“Total diCQA”) instead of quantifying each isomer, so three categories were created, “Total CQA”, “Total FQA”, and “Total di-CQA”, to compare across publications (Anthony, Clifford, and Noirot 1993). Each of these categories includes the sum of each isomer in that class; for example, “Total CQA” is a sum of 5-CQA, 4-CQA, and 3-CQA. 27 unique CGAs have been identified in coffee (Clifford et al. 2003; Clifford 2006). The limited recurrences (fewer than 2 publications) of some species led to their exclusion from data collection.

  18. Small Business Financial Dataset (2022–2023)

    • kaggle.com
    zip
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabrielle Charlton (2025). Small Business Financial Dataset (2022–2023) [Dataset]. https://www.kaggle.com/datasets/gabriellecharlton/coffee-shop-financial-dataset-synthetic-2022-2023
    Explore at:
    zip(22299 bytes)Available download formats
    Dataset updated
    Sep 2, 2025
    Authors
    Gabrielle Charlton
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📊 Coffee Shop Financial Dataset (Synthetic, 2022–2023)

    📝 Overview

    This dataset simulates the financial records of a small-town coffee shop over a two-year period (Jan 2022 – Dec 2023).
    It was designed for data science, bookkeeping, and analytics projects — including financial dashboards, revenue forecasting, and expense tracking.

    The dataset contains 5 CSV files representing different business accounts:
    1. checking_account_main.csv - Daily sales deposits (hot drinks, cold drinks, pastries, sandwiches) + operating expenses
    2. checking_account_secondary.csv - Monthly transfers between accounts + payroll funding
    3. credit_card_account.csv - Weekly credit card expenses (supplies, utilities, vendor charges) and payments
    4. gusto_payroll.csv - Payroll data for 3 employees + 1 contractor
    5. gusto_payroll_bc.csv - Payroll data for 3 full-time employees + 1 contractor + 1 seasonal employee, with actual tax breakdown for the province of British Columbia, Canada

    📂 File Details

    checking_account_main.csv

    • date
    • description
    • category (Sales, Utilities, Rent, Supplies, etc.)
    • amount (positive = inflow, negative = outflow)
    • balance

    checking_account_secondary.csv

    • date
    • description
    • amount
    • balance

    credit_card_account.csv

    • date
    • vendor
    • category (Supplies, Marketing, Utilities, etc.)
    • amount (negative = charge, positive = payment)
    • balance

    gusto_payroll.csv

    • date
    • employee_id
    • employee_name (Owner, Barista 1, Barista 2, Contractor)
    • role (Owner, Barista, Manager, Contractor)
    • gross_pay

    gusto_payroll_bc.csv

    This file simulates bi-weekly payroll data for a small coffee shop in British Columbia, Canada, covering January 2022 – December 2023.
    It reflects realistic Canadian payroll structure with federal and provincial tax breakdowns, CPP, EI, and additional factors.

    Columns: - date → Pay date (bi-weekly schedule)
    - employee_id → Unique identifier for each employee
    - employee_name → Owner, Barista 1, Barista 2, Manager, Contractor, plus a seasonal Barista (June–Aug 2022)
    - role → Role within the coffee shop (Owner, Barista, Manager, Contractor)
    - gross_pay → Total earnings before deductions (wages + tips + reimbursements)
    - federal_tax → Federal income tax withheld
    - provincial_tax → British Columbia income tax withheld
    - cpp_employee → Employee CPP contribution
    - ei_employee → Employee EI contribution
    - other_deductions → Placeholder for possible deductions (e.g., garnishments, union dues)
    - net_pay → Take-home pay after deductions
    - tips → Declared tips (taxable, included in gross pay)
    - travel_reimbursement → Non-taxable reimbursement for travel expenses (if applicable)
    - cpp_employer → Employer portion of CPP contributions
    - ei_employer → Employer portion of EI contributions

    Notes: - Payroll data is synthetic but modeled on Canadian payroll rules (2022–2023 rates).
    - A seasonal barista employee is included (employed June 1 – Aug 31, 2022).
    - Travel reimbursements are non-taxable and recorded separately.
    - This file allows users to practice payroll accounting, deductions analysis, and tax reconciliation.

    📈 Business Context

    • The coffee shop experiences higher sales September–February (holiday season & winter drinks).
    • Sales dip March–June due to seasonality in a small town.
    • Pastries are sourced from a local bakery, while sandwiches are made in-house.
    • Payroll includes 3 employees (baristas, manager) and 1 independent contractor.

    🎯 Possible Use Cases

    • Build a financial health dashboard
    • Forecast revenue and expenses
    • Create a profit & loss statement
    • Test SQL queries for accounting workflows
    • Explore data visualization with Python, R, or BI tools
    • Educational projects for small business analytics

    📜 License

    This dataset is released under the MIT License, free to use for research, learning, or commercial purposes.

    ⭐ If you use this dataset in your project or notebook, please credit and share your work, it helps the community!

    📷 Photo Credits: freepik

  19. f

    Table_1_Caffeine is negatively associated with depression in patients aged...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Bao; Peile Li; Yang Guo; Yanxu Zheng; Michael Smolinski; Jinshen He (2023). Table_1_Caffeine is negatively associated with depression in patients aged 20 and older.XLSX [Dataset]. http://doi.org/10.3389/fpsyt.2022.1037579.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Jing Bao; Peile Li; Yang Guo; Yanxu Zheng; Michael Smolinski; Jinshen He
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionPrevious studies have observed the association between caffeine intake and depression, but few have considered the potential threshold effect of this issue. Therefore, the study aimed to examine the association between caffeine consumption and depression in patients aged 20 years or older using curve fitting analysis.MethodsThe population was 3,263 patients from the 2017 to 2018 National Health and Nutrition Examination Survey (NHANES) with reliable answers to questions of caffeine intake and depression. Participants’ depression levels were assessed using the 9-item Patient Health Questionnaire (PHQ-9) depression scale and the caffeine consumption were investigated in a private room of NHANES. The confounding variables of this study included level of education, monthly sleepiness, age, marital status, race, cigarette smoking, sex and recreational activities.ResultsIn linear regression analysis, patients with a higher PHQ-9 score tend to have less caffeine intake. A similar conclusion was drawn in logistic regression model using PHQ-9 ≥ 10 as a cut-off score for depression. But when caffeine intake exceeded 90 mg, there was no significant association between caffeine intake and depression based on the curve fitting analysis.DiscussionThese results suggest that people can consume some caffeine to reduce depression. But further study is needed to examine the precise causal relationship between these factors.

  20. Global Grain and Coffee Price History (1973-2023)

    • kaggle.com
    zip
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdullah Sajid (2023). Global Grain and Coffee Price History (1973-2023) [Dataset]. https://www.kaggle.com/datasets/mabdullahsajid/global-grain-and-coffee-price-history-1973-2023
    Explore at:
    zip(111343 bytes)Available download formats
    Dataset updated
    Sep 6, 2023
    Authors
    Abdullah Sajid
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description: This dataset provides daily price records for three key agricultural commodities: coffee, wheat, and corn, spanning five decades from 1973 to 2023. The dataset is a valuable resource for researchers, analysts, and enthusiasts interested in understanding the historical price trends of these essential commodities in the global market.

    Columns: - Date: The date of the price record in yyyy-mm-dd format. - Coffee (USD): Daily prices of coffee in US dollars. - Wheat (USD): Daily prices of wheat in US dollars. - Corn (USD): Daily prices of corn in US dollars.

    Data Source: The dataset is compiled from reliable sources and represents a comprehensive record of daily commodity prices, making it an ideal tool for studying the dynamics of these agricultural markets over the past fifty years.

    Use Cases: - Analyze long-term price trends and patterns for coffee, wheat, and corn. - Create predictive models for commodity price forecasting. - Investigate the impact of various economic and environmental factors on commodity prices. - Explore correlations between commodity prices and global events.

    Acknowledgments: We would like to express our gratitude to the data sources that have contributed to the compilation of this dataset, making it freely available for research and analysis.

    Note: Please cite this dataset appropriately if you use it in your research or analysis.

    Start exploring the world of agricultural commodity prices by downloading this dataset today!

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Umer Haddii (2024). The Great American Coffee Taste Test Dataset [Dataset]. https://www.kaggle.com/datasets/umerhaddii/the-great-american-coffee-taste-test-dataset
Organization logo

The Great American Coffee Taste Test Dataset

James Hoffmann and Cometeer survey America's coffee taste preferences

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2024
Dataset provided by
Kaggle
Authors
Umer Haddii
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

Context

World champion barista James Hoffmann and Cometeer partnered to conduct a first-of-its-kind coffee taste test. Cometeer shipped 5000 coffee kits across America. Kits contained four different coffees - pre-extracted and flash frozen. Tasters melted and diluted the coffee capsules for a largely identical tasting experience. Tasting and ratings were conducted blind [1]. After survey responses were collected (provided data), some attributes of the coffee were revealed.

In October 2023, World champion barista James Hoffmann and coffee company Cometeer held the "Great American Coffee Taste Test" on YouTube, during which viewers were asked to fill out a survey about 4 coffees they ordered from Cometeer for the tasting. Data blogger Robert McKeon Aloe analyzed the data the following month.

Content

Geography: US

Time-period: 2023

Unit of Analysis: The Great American Coffee Taste Test

Variables

  • submission_id = Submission ID
  • age = What is your age?
  • cups = How many cups of coffee do you typically drink per day?
  • where_drink = Where do you typically drink coffee?
  • brew = How do you brew coffee at home?
  • brew_other = How else do you brew coffee at home?
  • purchase = On the go, where do you typically purchase coffee?
  • purchase_other = Where else do you purchase coffee?
  • favorite = What is your favorite coffee drink?
  • favorite_specify = Please specify what your favorite coffee drink is
  • additions = Do you usually add anything to your coffee?
  • additions_other = What else do you add to your coffee?
  • dairy = What kind of dairy do you add?
  • sweetener = What kind of sugar or sweetener do you add?
  • style = Before today's tasting, which of the following best described what kind of coffee you like?
    -**strength** = How strong do you like your coffee?
  • roast_level = What roast level of coffee do you prefer?
  • caffeine = How much caffeine do you like in your coffee?
  • expertise = Lastly, how would you rate your own coffee expertise?
  • coffee_a_bitterness = Coffee A - Bitterness
  • coffee_a_acidity = Coffee A - Acidity
  • coffee_a_personal_preference = Coffee A - Personal Preference
  • coffee_a_notes = Coffee A - Notes
  • coffee_b_bitterness = Coffee B - Bitterness
  • coffee_b_acidity = Coffee B - Acidity
  • coffee_b_personal_preference = Coffee B - Personal Preference
  • coffee_b_notes = Coffee B - Notes
  • coffee_c_bitterness = Coffee C - Bitterness
  • coffee_c_acidity = Coffee C - Acidity
  • coffee_c_personal_preference = Coffee C - Personal Preference
  • coffee_c_notes = Coffee C - Notes
  • coffee_d_bitterness = Coffee D - Bitterness
  • coffee_d_acidity = Coffee D - Acidity
  • coffee_d_personal_preference = Coffee D - Personal Preference
  • coffee_d_notes = Coffee D - Notes
  • prefer_abc = Between Coffee A, Coffee B, and Coffee C which did you prefer?
  • prefer_ad = Between Coffee A and Coffee D, which did you prefer?
  • prefer_overall = Lastly, what was your favorite overall coffee?
  • wfh = Do you work from home or in person?
  • total_spend = In total, how much money do you typically spend on coffee in a month?
  • why_drink = Why do you drink coffee?
  • why_drink_other = Other reason for drinking coffee
  • taste = Do you like the taste of coffee?
  • know_source = Do you know where your coffee comes from?
  • most_paid = What is the most you've ever paid for a cup of coffee?
  • most_willing = What is the most you'd ever be willing to pay for a cup of coffee?
  • value_cafe = Do you feel like you’re getting good value for your money when you buy coffee at a cafe?
  • spent_equipment = Approximately how much have you spent on coffee equipment in the past 5 years?
  • value_equipment = Do you feel like you’re getting good value for your money when you buy coffee at a cafe?
  • gender = Gender
  • gender_specify = Gender (please specify)
  • education_level = Education Level
  • ethnicity_race = Ethnicity/Race
  • ethnicity_race_specify = Ethnicity/Race (please specify)
  • employment_status = Employment Status
  • number_children = Number of Children
  • political_affiliation = Political Affiliation

Acknowledgement

Datasource: The data is collected thorugh a survey called The Great American Coffee Taste Test held by James Haffmann

Inspiration: [Great American Coffee...

Search
Clear search
Close search
Google apps
Main menu