Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
The dataset contains 2231142 cooking recipes (>2 millions). It's processed in more careful way and provides more samples than any other dataset in the area.
Facebook
TwitterThis dataset was created by Kalikrishna prasanna Yalamati
Released under Other (specified in description)
Facebook
TwitterMahimas/recipenlg dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The RecipeNLG dataset (Poznań University of Technology) is an expansion of the Recipe1M+ dataset, offering a significantly larger selection of recipes. Unlike its predecessor, this dataset does not prioritize linking cooking instructions with their corresponding images and instead emphasizes the recipe text, structure, and logic. This updated collection contains over one million newly preprocessed and deduplicated recipes, making it the largest publicly accessible dataset in its field.
The data could be downloaded from Poznań University of Technology website.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset of all the wonders of the world when it comes to food composition. Dishes ranging from a bread to a Swedish midsommer smorgasbord.
Some intersting things to know before starting: - There are duplicate dishes but the majority has a different recipe. - I have removed all duplicates in "NER" and made them all lowercase - Some of the columns are arrays, use df.apply and import Json to work with them. Check out the code in my notebook
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10179217%2Fff93e5e3e0876fbb0298c780e1d4e342%2FWordcloud.png?generation=1687898448282120&alt=media" alt="">
Link to original: https://clickhouse.com/docs/en/getting-started/example-datasets/recipes
I (the "Researcher") have requested permission to use the RecipeNLG dataset (the "Dataset") at Poznań University of Technology (PUT). In exchange for such permission, Researcher hereby agrees to the following terms and conditions:
Facebook
TwitterEmTpro01/recipe-nlg-50k dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A heavily curated dataset from recipe-nlg (source="Gathered" only). A lot of scraping artifacts, typographical errors, unicode, empty and very short recipes were removed. Then it has been formated into Alpaca instruction set with Instructions, Input and Output. The total number of recipes went from ~2M2 (original dataset) to ~500K. Obviously, it's still not perfect (I won't lie, the original dataset was very flawed). To fully fix this would require a very time-consuming manual edition, so you… See the full description on the dataset page: https://huggingface.co/datasets/AdamCodd/recipe-nlg-alpaca.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Please cite the dataset using the BibTex provided in one of the following sections if you are using it in your research, thank you!
A recipe is a piece of writing that outlines the materials required, the cooking procedure, and how to use them while preparing or baking food. Using the knowledge of food specialists and an active learning methodology, the "Assorted, Archetypal, and Annotated Two Million (3A2M) Cooking Recipe Dataset" has two million culinary meals annotated in relevant categories. The RecipeNLG collection serves as the foundation for the 3A2M cooking recipe dataset.
3A2M dataset comprises five features (five columns) in total: 1. title - the name of the food 2. directions - step-by-step description of the food recipes 3. NER - ingredients of cooking recipes 4. genre - assigned category (representation in string format) 5. label - numeric representation of the genres
Data for the title, directions, and NER attributes are directly appended from the RecipeNLG dataset. Three human specialists classify 300,000 random recipes into one of nine categories. The remaining 1900K recipes are automatically classified using active learning and a query-by-committee approach.
Human experts selected nine genres to categorize this dataset. The nine categories are: - bakery - drinks - non-veg - vegetables - fast food - cereals - meals - sides - fusion
If you're using this dataset for your work, please cite the following articles:
Citation in text format:
N. Sakib, G. Shahariar, M. M. Kabir, M. K. Hasan, and H. Mahmud, “Assorted, archetypal and annotated two million (3a2m) cooking recipes dataset
based on active learning.”
Citation in BibTex format:
@article{sakibassorted,
title={Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning},
author={Sakib, Nazmus and Shahariar, GM and Kabir, Md Mohsinul and Hasan, Md Kamrul and Mahmud, Hasan}
}
Considering the breadth and arrangement of the information by genre,
Facebook
TwitterDataset Card for "recipe-nlg-llama2"
More Information needed
Facebook
TwitterDataset Card for "tokenized-recipe-nlg-gpt2-ners-ingredients-only"
More Information needed
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Preprocessed cooking recipes data taken from RecipeNLG
Facebook
TwitterPlease cite the dataset using the BibTex provided in one of the following sections if you are using it in your research, thank you!
The "Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" has 22,31,142 culinary meals annotated in relevant categories. The 3A2M dataset serves as the foundation for the 3A2M+ cooking recipe dataset.
3A2M+ dataset comprises five features (five columns) in total: 1. title - the name of the food 2. directions - step-by-step description of the food recipes 3. NER - ingredients of cooking recipes 4. Extended NER - missing NER from the directions 5. genre - assigned category (representation in string format) 6. label - numeric representation of the genre (in total 9 categories)
Extended NERs are extracted from the text in the "directions" column text considering the following important factors: - Temperature of the food - Cooking Method - Missing Ingredients - Cooking Pot etc.
Data for the title, directions, and NER attributes are directly appended from the RecipeNLG dataset. Three human specialists classify 300,000 random recipes into one of nine categories. The remaining 1900K recipes are automatically classified using active learning and a query-by-committee approach. Named Entities are extracted using popular NER tools using a unique pipeline.
Human experts selected nine genres to categorize this dataset. The nine categories are: - bakery - drinks - non-veg - vegetables - fast food - cereals - meals - sides - fusion
If you're using this dataset for your work, please cite the following articles:
Citation in text format:
N. Sakib, G. Shahariar, M. M. Kabir, M. K. Hasan, and H. Mahmud, “Assorted, archetypal and annotated two million (3a2m) cooking recipes dataset
based on active learning.”
Citation in BibTex format:
@article{sakibassorted,
title={Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning},
author={Sakib, Nazmus and Shahariar, GM and Kabir, Md Mohsinul and Hasan, Md Kamrul and Mahmud, Hasan}
}
Considering the breadth and arrangement of the information by genre,
https://www.kaggle.com/datasets/nazmussakibrupol/3a2m-cooking-recipe-dataset
Facebook
TwitterDataset pairing GPT-4 synthesized instructions with outputs from RecipeNLG in Axolotl's "alpaca" jsonl format
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
The dataset contains 2231142 cooking recipes (>2 millions). It's processed in more careful way and provides more samples than any other dataset in the area.