13 datasets found
  1. h

    recipe_nlg

    • huggingface.co
    • opendatalab.com
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michał Bień (2024). recipe_nlg [Dataset]. https://huggingface.co/datasets/mbien/recipe_nlg
    Explore at:
    Dataset updated
    May 24, 2024
    Authors
    Michał Bień
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    The dataset contains 2231142 cooking recipes (>2 millions). It's processed in more careful way and provides more samples than any other dataset in the area.

  2. RecipeNLG-dataset

    • kaggle.com
    zip
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kalikrishna prasanna Yalamati (2025). RecipeNLG-dataset [Dataset]. https://www.kaggle.com/datasets/kalikrishnaprasanna/recipenlg-dataset
    Explore at:
    zip(669109979 bytes)Available download formats
    Dataset updated
    May 4, 2025
    Authors
    Kalikrishna prasanna Yalamati
    Description

    Dataset

    This dataset was created by Kalikrishna prasanna Yalamati

    Released under Other (specified in description)

    Contents

  3. h

    recipenlg

    • huggingface.co
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahima Singavarapu (2025). recipenlg [Dataset]. https://huggingface.co/datasets/Mahimas/recipenlg
    Explore at:
    Dataset updated
    Sep 15, 2025
    Authors
    Mahima Singavarapu
    Description

    Mahimas/recipenlg dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. RecipeNLG dataset

    • kaggle.com
    Updated Mar 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SalDenisov (2023). RecipeNLG dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/5245331
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SalDenisov
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The RecipeNLG dataset (Poznań University of Technology) is an expansion of the Recipe1M+ dataset, offering a significantly larger selection of recipes. Unlike its predecessor, this dataset does not prioritize linking cooking instructions with their corresponding images and instead emphasizes the recipe text, structure, and logic. This updated collection contains over one million newly preprocessed and deduplicated recipes, making it the largest publicly accessible dataset in its field.

    The data could be downloaded from Poznań University of Technology website.

  5. Recipe Dataset (over 2M) Food

    • kaggle.com
    zip
    Updated Jun 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wilmer Arlt Strömberg (2023). Recipe Dataset (over 2M) Food [Dataset]. https://www.kaggle.com/datasets/wilmerarltstrmberg/recipe-dataset-over-2m/
    Explore at:
    zip(666260137 bytes)Available download formats
    Dataset updated
    Jun 27, 2023
    Authors
    Wilmer Arlt Strömberg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset of all the wonders of the world when it comes to food composition. Dishes ranging from a bread to a Swedish midsommer smorgasbord.

    Some intersting things to know before starting: - There are duplicate dishes but the majority has a different recipe. - I have removed all duplicates in "NER" and made them all lowercase - Some of the columns are arrays, use df.apply and import Json to work with them. Check out the code in my notebook

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10179217%2Fff93e5e3e0876fbb0298c780e1d4e342%2FWordcloud.png?generation=1687898448282120&alt=media" alt="">

    Link to original: https://clickhouse.com/docs/en/getting-started/example-datasets/recipes

    Try it with SQL: https://play.clickhouse.com/play?user=play#U0VMRUNUCiAgICBhcnJheUpvaW4oTkVSKSBBUyBrLAogICAgY291bnQoKSBBUyBjCkZST00gcmVjaXBlcwpHUk9VUCBCWSBrCk9SREVSIEJZIGMgREVTQwpMSU1JVCA1MA==

    Terms and Conditions

    I (the "Researcher") have requested permission to use the RecipeNLG dataset (the "Dataset") at Poznań University of Technology (PUT). In exchange for such permission, Researcher hereby agrees to the following terms and conditions:

    1. Researcher shall use the Dataset only for non-commercial research and educational purposes.
    2. PUT makes no representations or warranties regarding the Dataset, including but not limited to warranties of non-infringement or fitness for a particular purpose.
    3. Researcher accepts full responsibility for his or her use of the Dataset and shall defend and indemnify PUT, including its employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Dataset including but not limited to Researcher's use of any copies of copyrighted images or text that he or she may create from the Dataset.
    4. Researcher may provide research associates and colleagues with access to the Dataset provided that they first agree to be bound by these terms and conditions.
    5. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.
  6. h

    recipe-nlg-50k

    • huggingface.co
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huzaifa Ali (2025). recipe-nlg-50k [Dataset]. https://huggingface.co/datasets/EmTpro01/recipe-nlg-50k
    Explore at:
    Dataset updated
    Jun 3, 2025
    Authors
    Huzaifa Ali
    Description

    EmTpro01/recipe-nlg-50k dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    recipe-nlg-alpaca

    • huggingface.co
    Updated Jun 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AdamCodd (2024). recipe-nlg-alpaca [Dataset]. https://huggingface.co/datasets/AdamCodd/recipe-nlg-alpaca
    Explore at:
    Dataset updated
    Jun 30, 2024
    Authors
    AdamCodd
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A heavily curated dataset from recipe-nlg (source="Gathered" only). A lot of scraping artifacts, typographical errors, unicode, empty and very short recipes were removed. Then it has been formated into Alpaca instruction set with Instructions, Input and Output. The total number of recipes went from ~2M2 (original dataset) to ~500K. Obviously, it's still not perfect (I won't lie, the original dataset was very flawed). To fully fix this would require a very time-consuming manual edition, so you… See the full description on the dataset page: https://huggingface.co/datasets/AdamCodd/recipe-nlg-alpaca.

  8. 3A2M Cooking Recipe Dataset

    • kaggle.com
    Updated Mar 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nazmus Sakib Rupol (2023). 3A2M Cooking Recipe Dataset [Dataset]. https://www.kaggle.com/datasets/nazmussakibrupol/3a2m-cooking-recipe-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2023
    Dataset provided by
    Kaggle
    Authors
    Nazmus Sakib Rupol
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Please cite the dataset using the BibTex provided in one of the following sections if you are using it in your research, thank you!

    A recipe is a piece of writing that outlines the materials required, the cooking procedure, and how to use them while preparing or baking food. Using the knowledge of food specialists and an active learning methodology, the "Assorted, Archetypal, and Annotated Two Million (3A2M) Cooking Recipe Dataset" has two million culinary meals annotated in relevant categories. The RecipeNLG collection serves as the foundation for the 3A2M cooking recipe dataset.

    Content

    3A2M dataset comprises five features (five columns) in total: 1. title - the name of the food 2. directions - step-by-step description of the food recipes 3. NER - ingredients of cooking recipes 4. genre - assigned category (representation in string format) 5. label - numeric representation of the genres

    Data for the title, directions, and NER attributes are directly appended from the RecipeNLG dataset. Three human specialists classify 300,000 random recipes into one of nine categories. The remaining 1900K recipes are automatically classified using active learning and a query-by-committee approach.

    Genres

    Human experts selected nine genres to categorize this dataset. The nine categories are: - bakery - drinks - non-veg - vegetables - fast food - cereals - meals - sides - fusion

    Citation

    If you're using this dataset for your work, please cite the following articles:

    Citation in text format: N. Sakib, G. Shahariar, M. M. Kabir, M. K. Hasan, and H. Mahmud, “Assorted, archetypal and annotated two million (3a2m) cooking recipes dataset based on active learning.”

    Citation in BibTex format: @article{sakibassorted, title={Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning}, author={Sakib, Nazmus and Shahariar, GM and Kabir, Md Mohsinul and Hasan, Md Kamrul and Mahmud, Hasan} }

    Inspiration

    Considering the breadth and arrangement of the information by genre,

    1. medical nutritionists may recommend a variety of meals to patients.
    2. If a portion of the recipe can be estimated, which can be used to analyze food intake for various types of food analysis or nutrients.
    3. This well-annotated dataset might be utilized for NLP tasks such as recipe generation.
  9. h

    recipe-nlg-llama2

    • huggingface.co
    Updated Oct 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soham Karandikar (2023). recipe-nlg-llama2 [Dataset]. https://huggingface.co/datasets/skadewdl3/recipe-nlg-llama2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 20, 2023
    Authors
    Soham Karandikar
    Description

    Dataset Card for "recipe-nlg-llama2"

    More Information needed

  10. h

    tokenized-recipe-nlg-gpt2-ners-ingredients-only

    • huggingface.co
    Updated Aug 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratul Tandon (2023). tokenized-recipe-nlg-gpt2-ners-ingredients-only [Dataset]. https://huggingface.co/datasets/pratultandon/tokenized-recipe-nlg-gpt2-ners-ingredients-only
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2023
    Authors
    Pratul Tandon
    Description

    Dataset Card for "tokenized-recipe-nlg-gpt2-ners-ingredients-only"

    More Information needed

  11. recipes tokenized

    • kaggle.com
    zip
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giorgio (2025). recipes tokenized [Dataset]. https://www.kaggle.com/datasets/giochelavaipiatti/recipes-tokenized
    Explore at:
    zip(888445234 bytes)Available download formats
    Dataset updated
    May 28, 2025
    Authors
    Giorgio
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Preprocessed cooking recipes data taken from RecipeNLG

  12. 3A2M+ Cooking Recipe Dataset

    • kaggle.com
    zip
    Updated Mar 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nazmus Sakib Rupol (2023). 3A2M+ Cooking Recipe Dataset [Dataset]. https://www.kaggle.com/datasets/nazmussakibrupol/3a2mext
    Explore at:
    zip(515255730 bytes)Available download formats
    Dataset updated
    Mar 2, 2023
    Authors
    Nazmus Sakib Rupol
    Description

    Context

    Please cite the dataset using the BibTex provided in one of the following sections if you are using it in your research, thank you!

    The "Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" has 22,31,142 culinary meals annotated in relevant categories. The 3A2M dataset serves as the foundation for the 3A2M+ cooking recipe dataset.

    Content

    3A2M+ dataset comprises five features (five columns) in total: 1. title - the name of the food 2. directions - step-by-step description of the food recipes 3. NER - ingredients of cooking recipes 4. Extended NER - missing NER from the directions 5. genre - assigned category (representation in string format) 6. label - numeric representation of the genre (in total 9 categories)

    Extended NERs are extracted from the text in the "directions" column text considering the following important factors: - Temperature of the food - Cooking Method - Missing Ingredients - Cooking Pot etc.

    Data for the title, directions, and NER attributes are directly appended from the RecipeNLG dataset. Three human specialists classify 300,000 random recipes into one of nine categories. The remaining 1900K recipes are automatically classified using active learning and a query-by-committee approach. Named Entities are extracted using popular NER tools using a unique pipeline.

    Genres

    Human experts selected nine genres to categorize this dataset. The nine categories are: - bakery - drinks - non-veg - vegetables - fast food - cereals - meals - sides - fusion

    Citation

    If you're using this dataset for your work, please cite the following articles:

    Citation in text format: N. Sakib, G. Shahariar, M. M. Kabir, M. K. Hasan, and H. Mahmud, “Assorted, archetypal and annotated two million (3a2m) cooking recipes dataset based on active learning.”

    Citation in BibTex format: @article{sakibassorted, title={Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning}, author={Sakib, Nazmus and Shahariar, GM and Kabir, Md Mohsinul and Hasan, Md Kamrul and Mahmud, Hasan} }

    Inspiration

    Considering the breadth and arrangement of the information by genre,

    1. If a portion of the recipe can be estimated, which can be used to analyze food intake for various types of food analysis or nutrients.
    2. This well-annotated dataset might be utilized for NLP tasks such as recipe generation.

    3A2M Cooking Recipe Dataset

    https://www.kaggle.com/datasets/nazmussakibrupol/3a2m-cooking-recipe-dataset

    3A2M Cooking Recipe Dataset ( Human Annotated Subset)

    https://t.ly/qI8rb

  13. h

    2000-sample-synthetic-recipe-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cad, 2000-sample-synthetic-recipe-dataset [Dataset]. https://huggingface.co/datasets/cadaeic/2000-sample-synthetic-recipe-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Cad
    Description

    Dataset pairing GPT-4 synthesized instructions with outputs from RecipeNLG in Axolotl's "alpaca" jsonl format

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Michał Bień (2024). recipe_nlg [Dataset]. https://huggingface.co/datasets/mbien/recipe_nlg

recipe_nlg

RecipeNLG

mbien/recipe_nlg

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 24, 2024
Authors
Michał Bień
License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

The dataset contains 2231142 cooking recipes (>2 millions). It's processed in more careful way and provides more samples than any other dataset in the area.

Search
Clear search
Close search
Google apps
Main menu