Facebook
TwitterNutritional compositions of food items in India published by the IFCT.
Online portal http://ifct2017.github.io GitHub repository https://github.com/ifct2017/compositions Node.js package https://www.npmjs.com/package/@ifct2017/compositions Data source http://ifct2017.com Research organization http://www.ninindia.org
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
When I browsed for a Food Recipes (Especially Indian Food) Dataset, I could not find one (that I could use) online. So, I decided to create one.
The dataset has following fields (self-explanatory) - ['RecipeName', 'TranslatedRecipeName', 'Ingredients', 'TranslatedIngredients', 'Prep', 'Cook', 'Total', 'Servings', 'Cuisine', 'Course', 'Diet', 'Instructions', 'TranslatedInstructions']. The datset contains a csv and a xls file. Sometimes, the content in Hindi is not visible in the csv format.
You might be wondering what the columns with the prefix 'Translated' are. So, a lot of entries in the dataset were in Hindi language. To take care of such entries and translating them to English for consistency, I went ahead and used 'googletrans'. It is a python library that implements Google Translate API underneath.
The code for the crawler, cleaning and transformation is on my Github repository (@kanishk307).
The dataset has been created using Archana's Kitchen Website. It is a great website and hosts a ton of useful content. You should definitely consider viewing it if you are interested.
The dataset can be used to answer a lot of questions related to Food Recipes. You can see the explore the serving sizes, time required to prepare a dish, most common ingredients, different cuisines, diets, courses and what not. I hope this dataset helps the Analytics community.
Facebook
TwitterUnderstanding the nutritional composition of everyday foods is essential for diet planning, health analysis, and building intelligent food-related applications. This dataset provides clean, structured, and easy-to-use nutritional information for more than 200 commonly consumed foods, including fruits, vegetables, grains, dairy, beverages, snacks, and cooked dishes.
The data has been sourced from the USDA FoodData Central API, which is one of the most trusted open food-nutrition sources globally. Only normal, everyday foods were selected—no supplements, no powdered mixes, no infant formulas, and no obscure scientific items.
The dataset is curated to be clean, practical, and ready for ML.
| Column Name | Data Type | Description |
|---|---|---|
| food_name | string | Name/description of the food item (cleaned). |
| category | string | Food category such as Fruits, Dairy, Grains, Poultry, Snacks, etc. |
| calories | float | Total energy per 100g (Kcal). |
| protein | float | Protein content in grams. |
| carbs | float | Carbohydrates in grams. |
| fat | float | Total fat in grams. |
| iron | float | Iron content (mg). |
| vitamin_c | float | Vitamin C content (mg). |
| vitamin_a | float | Vitamin A content (IU). |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionIndia’s food production and utilization status have affected the health and wellbeing of the population and healthcare systems. As a result, women, adolescent girls, and children are suffering anaemia which causes delayed mental and psychomotor development, morbidity, and maternal mortality. Several programs are running i.e. Public Distribution System (PDS), Integrated Child Development Scheme (ICDS) and Mid-Day Meal (MDM) etc. while targeting the vulnerable communities of India to meet their basic food and nutrition requirements.MethodsThe study was conducted in the Bundelkhand region, a nutritionally vulnerable area with a high infant mortality rate and an average Human Development Index score below the national average. A total of 320 respondents from four districts were selected for the study, and are being asked about their preferences of food groups in four meals.ResultsResults showed respondents have a discernible preference for certain foods across the four meals. The most well-liked food groups were “Oil/fat”, “Cereals”, “Roots/tubers” and “Vegetables”. They preferred more food groups to be included in dinner followed by lunch. The study found a strong correlation between the food groups “Cereals”, “Roots/tubers” and “Oil/fat” and the three primary meals of the day, namely breakfast, lunch, and dinner. Fish and meat are preferred during evening meals, serving as a valuable protein source.DiscussionThis trend in food habits is influenced by the cereal-based production systems, cultural norms and social dynamics of India, which needs a major reform.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Background: In 1986, the Congress enacted Public Laws 99-500 and 99-591, requiring a biennial report on the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC). In response to these requirements, FNS developed a prototype system that allowed for the routine acquisition of information on WIC participants from WIC State Agencies. Since 1992, State Agencies have provided electronic copies of these data to FNS on a biennial basis.FNS and the National WIC Association (formerly National Association of WIC Directors) agreed on a set of data elements for the transfer of information. In addition, FNS established a minimum standard dataset for reporting participation data. For each biennial reporting cycle, each State Agency is required to submit a participant-level dataset containing standardized information on persons enrolled at local agencies for the reference month of April. The 2020 Participant and Program Characteristics (PC2020) is the 17th to be completed using the prototype PC reporting system. In April 2020, there were 89 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and 33 Indian Tribal Organizations (ITOs).Processing methods and equipment used: Specifications on formats (“Guidance for States Providing Participant Data”) were provided to all State agencies in January 2020. This guide specified 20 minimum dataset (MDS) elements and 11 supplemental dataset (SDS) elements to be reported on each WIC participant. Each State Agency was required to submit all 20 MDS items and any SDS items collected by the State agency. Study date(s) and duration The information for each participant was from the participants’ most current WIC certification as of April 2020.Study spatial scale (size of replicates and spatial scale of study area): In April 2020, there were 89 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and 33 Indian Tribal Organizations (ITOs).Level of true replication: UnknownSampling precision (within-replicate sampling or pseudoreplication):State Agency Data Submissions. PC2020 is a participant dataset consisting of 7,036,867 active records. The records, submitted to USDA by the State Agencies, comprise a census of all WIC enrollees, so there is no sampling involved in the collection of this data.PII Analytic Datasets. State agency files were combined to create a national census participant file of approximately 7 million records. The census dataset contains potentially personally identifiable information (PII) and is therefore not made available to the public.National Sample Dataset. The public use SAS analytic dataset made available to the public has been constructed from a nationally representative sample drawn from the census of WIC participants, selected by participant category. The national sample consists of 1 percent of the total number of participants, or 70,368 records. The distribution by category is 5,469 pregnant women, 6,131 breastfeeding women, 4,373 postpartum women, 16,817 infants, and 37,578 children.Level of subsampling (number and repeat or within-replicate sampling): The proportionate (or self-weighting) sample was drawn by WIC participant category: pregnant women, breastfeeding women, postpartum women, infants, and children. In this type of sample design, each WIC participant has the same probability of selection across all strata. Sampling weights are not needed when the data are analyzed. In a proportionate stratified sample, the largest stratum accounts for the highest percentage of the analytic sample.Study design (before–after, control–impacts, time series, before–after-control–impacts): None – Non-experimentalDescription of any data manipulation, modeling, or statistical analysis undertaken: Each entry in the dataset contains all MDS and SDS information submitted by the State agency on the sampled WIC participant. In addition, the file contains constructed variables used for analytic purposes. To protect individual privacy, the public use file does not include State agency, local agency, or case identification numbers.Description of any gaps in the data or other limiting factors: All State agencies provided data on a census of their WIC participants.Resources in this dataset:Resource Title: WIC PC 2020 National Sample File Public Use Codebook.; File Name: PC2020 National Sample File Public Use Codebook.docx; Resource Description: WIC PC 2020 National Sample File Public Use CodebookResource Title: WIC PC 2020 Public Use CSV Data.; File Name: wicpc2020_public_use.csv; Resource Description: WIC PC 2020 Public Use CSV DataResource Title: WIC PC 2020 Data Set SAS, R, SPSS, Stata.; File Name: PC2020 Ag Data Commons.zipResource; Description: WIC PC 2020 Data Set SAS, R, SPSS, Stata One dataset in multiple formats
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset has been programmatically generated using AI models and cleaned thoroughly to ensure quality and usability. It contains a curated set of global dishes—spanning Japanese, Korean, Indian, Italian, French, Continental, and Mediterranean cuisines—with comprehensive information across key dietary dimensions.
Each dish includes:
I used an AI Agent pipeline, integrating Ollama with a local LLM (tinyllama) to:
The dataset was saved in .csv format and cleaned to remove:
"1. " or quotes in dish names | Column | Description |
|---|---|
| Dish Name | Name of the dish |
| Description | Short description |
| Cuisine | Cultural origin of the dish |
| Meal Type | Breakfast, Lunch, or Dinner |
| Diet | Veg or Non-Veg |
| Tags | Keywords for classification |
| Calories (kcal) | Approximate calories |
| Protein (g) | Protein content in grams |
| Fat (g) | Fat content in grams |
| Carbohydrates (g) | Carbohydrate content in grams |
| Allergens | Known common allergens (if any) |
Himanshi Kushwaha
Master’s in CS @ Indiana University
AI Engineer & Researcher
Feel free to connect or collaborate on cool AI x Food ideas!``
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information from McDonald's official website .
It has calorie content and nutrition information from their entire menu.
The dataset is specific to the Indian McDonald's menu.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Major portion of India’s economy is based on agriculture and animal husbandry wherein milk production plays vital role. India is ranked first in production of milk in the world followed by United States, China, and Germany. Production of milk of India was around 140 million tonnes in 2013-14 according to NDDB. Datasets of milk production up to year 2013-2014 are published under department of animal husbandry of Ministry of Agriculture in Data portal of India (data.gov.in).
Source: https://community.data.gov.in/milk-production-in-india/
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterNutritional compositions of food items in India published by the IFCT.
Online portal http://ifct2017.github.io GitHub repository https://github.com/ifct2017/compositions Node.js package https://www.npmjs.com/package/@ifct2017/compositions Data source http://ifct2017.com Research organization http://www.ninindia.org