13 datasets found

h
recipe_nlg
huggingface.co
opendatalab.com
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michał Bień (2024). recipe_nlg [Dataset]. https://huggingface.co/datasets/mbien/recipe_nlg
Explore at:
Dataset updated
May 24, 2024
Authors
Michał Bień
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
The dataset contains 2231142 cooking recipes (>2 millions). It's processed in more careful way and provides more samples than any other dataset in the area.
RecipeNLG-dataset
kaggle.com
zip
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kalikrishna prasanna Yalamati (2025). RecipeNLG-dataset [Dataset]. https://www.kaggle.com/datasets/kalikrishnaprasanna/recipenlg-dataset
Explore at:
zip(669109979 bytes)Available download formats
Dataset updated
May 4, 2025
Authors
Kalikrishna prasanna Yalamati
Description
Dataset

This dataset was created by Kalikrishna prasanna Yalamati

Released under Other (specified in description)

Contents
h
recipenlg
huggingface.co
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahima Singavarapu (2025). recipenlg [Dataset]. https://huggingface.co/datasets/Mahimas/recipenlg
Explore at:
Dataset updated
Sep 15, 2025
Authors
Mahima Singavarapu
Description
Mahimas/recipenlg dataset hosted on Hugging Face and contributed by the HF Datasets community
RecipeNLG dataset
kaggle.com
Updated Mar 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SalDenisov (2023). RecipeNLG dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/5245331
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5245331
Dataset updated
Mar 27, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SalDenisov
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The RecipeNLG dataset (Poznań University of Technology) is an expansion of the Recipe1M+ dataset, offering a significantly larger selection of recipes. Unlike its predecessor, this dataset does not prioritize linking cooking instructions with their corresponding images and instead emphasizes the recipe text, structure, and logic. This updated collection contains over one million newly preprocessed and deduplicated recipes, making it the largest publicly accessible dataset in its field.

The data could be downloaded from Poznań University of Technology website.
Recipe Dataset (over 2M) Food
kaggle.com
zip
Updated Jun 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wilmer Arlt Strömberg (2023). Recipe Dataset (over 2M) Food [Dataset]. https://www.kaggle.com/datasets/wilmerarltstrmberg/recipe-dataset-over-2m/
Explore at:
zip(666260137 bytes)Available download formats
Dataset updated
Jun 27, 2023
Authors
Wilmer Arlt Strömberg
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset of all the wonders of the world when it comes to food composition. Dishes ranging from a bread to a Swedish midsommer smorgasbord.

Some intersting things to know before starting: - There are duplicate dishes but the majority has a different recipe. - I have removed all duplicates in "NER" and made them all lowercase - Some of the columns are arrays, use df.apply and import Json to work with them. Check out the code in my notebook

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10179217%2Fff93e5e3e0876fbb0298c780e1d4e342%2FWordcloud.png?generation=1687898448282120&alt=media" alt="">

Link to original: https://clickhouse.com/docs/en/getting-started/example-datasets/recipes

Try it with SQL: https://play.clickhouse.com/play?user=play#U0VMRUNUCiAgICBhcnJheUpvaW4oTkVSKSBBUyBrLAogICAgY291bnQoKSBBUyBjCkZST00gcmVjaXBlcwpHUk9VUCBCWSBrCk9SREVSIEJZIGMgREVTQwpMSU1JVCA1MA==

Terms and Conditions

I (the "Researcher") have requested permission to use the RecipeNLG dataset (the "Dataset") at Poznań University of Technology (PUT). In exchange for such permission, Researcher hereby agrees to the following terms and conditions:

Researcher shall use the Dataset only for non-commercial research and educational purposes.

PUT makes no representations or warranties regarding the Dataset, including but not limited to warranties of non-infringement or fitness for a particular purpose.

Researcher accepts full responsibility for his or her use of the Dataset and shall defend and indemnify PUT, including its employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Dataset including but not limited to Researcher's use of any copies of copyrighted images or text that he or she may create from the Dataset.

Researcher may provide research associates and colleagues with access to the Dataset provided that they first agree to be bound by these terms and conditions.

If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.
h
recipe-nlg-50k
huggingface.co
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huzaifa Ali (2025). recipe-nlg-50k [Dataset]. https://huggingface.co/datasets/EmTpro01/recipe-nlg-50k
Explore at:
Dataset updated
Jun 3, 2025
Authors
Huzaifa Ali
Description
EmTpro01/recipe-nlg-50k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
recipe-nlg-alpaca
huggingface.co
Updated Jun 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AdamCodd (2024). recipe-nlg-alpaca [Dataset]. https://huggingface.co/datasets/AdamCodd/recipe-nlg-alpaca
Explore at:
Dataset updated
Jun 30, 2024
Authors
AdamCodd
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A heavily curated dataset from recipe-nlg (source="Gathered" only). A lot of scraping artifacts, typographical errors, unicode, empty and very short recipes were removed. Then it has been formated into Alpaca instruction set with Instructions, Input and Output. The total number of recipes went from ~2M2 (original dataset) to ~500K. Obviously, it's still not perfect (I won't lie, the original dataset was very flawed). To fully fix this would require a very time-consuming manual edition, so you… See the full description on the dataset page: https://huggingface.co/datasets/AdamCodd/recipe-nlg-alpaca.
3A2M Cooking Recipe Dataset
kaggle.com
Updated Mar 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nazmus Sakib Rupol (2023). 3A2M Cooking Recipe Dataset [Dataset]. https://www.kaggle.com/datasets/nazmussakibrupol/3a2m-cooking-recipe-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2023
Dataset provided by
Kaggle
Authors
Nazmus Sakib Rupol
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

Please cite the dataset using the BibTex provided in one of the following sections if you are using it in your research, thank you!

A recipe is a piece of writing that outlines the materials required, the cooking procedure, and how to use them while preparing or baking food. Using the knowledge of food specialists and an active learning methodology, the "Assorted, Archetypal, and Annotated Two Million (3A2M) Cooking Recipe Dataset" has two million culinary meals annotated in relevant categories. The RecipeNLG collection serves as the foundation for the 3A2M cooking recipe dataset.

Content

3A2M dataset comprises five features (five columns) in total: 1. title - the name of the food 2. directions - step-by-step description of the food recipes 3. NER - ingredients of cooking recipes 4. genre - assigned category (representation in string format) 5. label - numeric representation of the genres

Data for the title, directions, and NER attributes are directly appended from the RecipeNLG dataset. Three human specialists classify 300,000 random recipes into one of nine categories. The remaining 1900K recipes are automatically classified using active learning and a query-by-committee approach.

Genres

Human experts selected nine genres to categorize this dataset. The nine categories are: - bakery - drinks - non-veg - vegetables - fast food - cereals - meals - sides - fusion

Citation

If you're using this dataset for your work, please cite the following articles:

Citation in text format: N. Sakib, G. Shahariar, M. M. Kabir, M. K. Hasan, and H. Mahmud, “Assorted, archetypal and annotated two million (3a2m) cooking recipes dataset based on active learning.”

Citation in BibTex format: @article{sakibassorted, title={Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning}, author={Sakib, Nazmus and Shahariar, GM and Kabir, Md Mohsinul and Hasan, Md Kamrul and Mahmud, Hasan} }

Inspiration

Considering the breadth and arrangement of the information by genre,

medical nutritionists may recommend a variety of meals to patients.

If a portion of the recipe can be estimated, which can be used to analyze food intake for various types of food analysis or nutrients.

This well-annotated dataset might be utilized for NLP tasks such as recipe generation.
h
recipe-nlg-llama2
huggingface.co
Updated Oct 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soham Karandikar (2023). recipe-nlg-llama2 [Dataset]. https://huggingface.co/datasets/skadewdl3/recipe-nlg-llama2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 20, 2023
Authors
Soham Karandikar
Description
Dataset Card for "recipe-nlg-llama2"

More Information needed
h
tokenized-recipe-nlg-gpt2-ners-ingredients-only
huggingface.co
Updated Aug 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pratul Tandon (2023). tokenized-recipe-nlg-gpt2-ners-ingredients-only [Dataset]. https://huggingface.co/datasets/pratultandon/tokenized-recipe-nlg-gpt2-ners-ingredients-only
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2023
Authors
Pratul Tandon
Description
Dataset Card for "tokenized-recipe-nlg-gpt2-ners-ingredients-only"

More Information needed
recipes tokenized
kaggle.com
zip
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giorgio (2025). recipes tokenized [Dataset]. https://www.kaggle.com/datasets/giochelavaipiatti/recipes-tokenized
Explore at:
zip(888445234 bytes)Available download formats
Dataset updated
May 28, 2025
Authors
Giorgio
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Preprocessed cooking recipes data taken from RecipeNLG
3A2M+ Cooking Recipe Dataset
kaggle.com
zip
Updated Mar 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nazmus Sakib Rupol (2023). 3A2M+ Cooking Recipe Dataset [Dataset]. https://www.kaggle.com/datasets/nazmussakibrupol/3a2mext
Explore at:
zip(515255730 bytes)Available download formats
Dataset updated
Mar 2, 2023
Authors
Nazmus Sakib Rupol
Description
Context

Please cite the dataset using the BibTex provided in one of the following sections if you are using it in your research, thank you!

The "Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" has 22,31,142 culinary meals annotated in relevant categories. The 3A2M dataset serves as the foundation for the 3A2M+ cooking recipe dataset.

Content

3A2M+ dataset comprises five features (five columns) in total: 1. title - the name of the food 2. directions - step-by-step description of the food recipes 3. NER - ingredients of cooking recipes 4. Extended NER - missing NER from the directions 5. genre - assigned category (representation in string format) 6. label - numeric representation of the genre (in total 9 categories)

Extended NERs are extracted from the text in the "directions" column text considering the following important factors: - Temperature of the food - Cooking Method - Missing Ingredients - Cooking Pot etc.

Data for the title, directions, and NER attributes are directly appended from the RecipeNLG dataset. Three human specialists classify 300,000 random recipes into one of nine categories. The remaining 1900K recipes are automatically classified using active learning and a query-by-committee approach. Named Entities are extracted using popular NER tools using a unique pipeline.

Genres

Human experts selected nine genres to categorize this dataset. The nine categories are: - bakery - drinks - non-veg - vegetables - fast food - cereals - meals - sides - fusion

Citation

If you're using this dataset for your work, please cite the following articles:

Citation in text format: N. Sakib, G. Shahariar, M. M. Kabir, M. K. Hasan, and H. Mahmud, “Assorted, archetypal and annotated two million (3a2m) cooking recipes dataset based on active learning.”

Citation in BibTex format: @article{sakibassorted, title={Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning}, author={Sakib, Nazmus and Shahariar, GM and Kabir, Md Mohsinul and Hasan, Md Kamrul and Mahmud, Hasan} }

Inspiration

Considering the breadth and arrangement of the information by genre,

If a portion of the recipe can be estimated, which can be used to analyze food intake for various types of food analysis or nutrients.

This well-annotated dataset might be utilized for NLP tasks such as recipe generation.

3A2M Cooking Recipe Dataset

https://www.kaggle.com/datasets/nazmussakibrupol/3a2m-cooking-recipe-dataset

3A2M Cooking Recipe Dataset ( Human Annotated Subset)

https://t.ly/qI8rb
h
2000-sample-synthetic-recipe-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cad, 2000-sample-synthetic-recipe-dataset [Dataset]. https://huggingface.co/datasets/cadaeic/2000-sample-synthetic-recipe-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Cad
Description
Dataset pairing GPT-4 synthesized instructions with outputs from RecipeNLG in Axolotl's "alpaca" jsonl format
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Michał Bień (2024). recipe_nlg [Dataset]. https://huggingface.co/datasets/mbien/recipe_nlg

recipe_nlg

RecipeNLG

mbien/recipe_nlg

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

May 24, 2024

Authors

Michał Bień

License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

The dataset contains 2231142 cooking recipes (>2 millions). It's processed in more careful way and provides more samples than any other dataset in the area.

Clear search

Close search

Google apps

Main menu

recipe_nlg

RecipeNLG-dataset

Dataset

Contents

recipenlg

RecipeNLG dataset

Recipe Dataset (over 2M) Food

Terms and Conditions

recipe-nlg-50k

recipe-nlg-alpaca

3A2M Cooking Recipe Dataset

Context

Content

Genres

Citation

Inspiration

recipe-nlg-llama2

tokenized-recipe-nlg-gpt2-ners-ingredients-only

recipes tokenized

3A2M+ Cooking Recipe Dataset

Context

Content

Genres

Citation

Inspiration

3A2M Cooking Recipe Dataset

3A2M Cooking Recipe Dataset ( Human Annotated Subset)

2000-sample-synthetic-recipe-dataset

recipe_nlg

RecipeNLG

mbien/recipe_nlg